Rebuild a Software RAID Array
Rebuild a Software RAID Array
Today, This guide shows you How Rebuild a Software RAID Array and remove a failed hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.
The drive /dev/sdb has been replaced. Please be careful when referencing this guide, remembering to specify the correct drive/partitions that apply to your scenario!
Log into your server through SSH. If you server will not boot, you must Reboot Your Server into Rescue Mode.
Once logged in, first check the status of the RAID array by using the command cat /proc/mdstat
This will output the multi-disk status. Below is an example of two properly functioning RAID arrays.
The above example shows two healthy RAID 1 arrays. Since each array has a status of [2/2] and [UU] this shows that out of 2 devices in the array, 2 devices are functional and both are up.
Reference the example below where a failed drive (sdb) has been replaced with a new blank drive and therefore the status reads [2/1] and [U_] since out of 2 devices in the array, only 1 is functional and only 1 is up. This is because the sda drive is still functioning while the sdb drive has been replaced and needs to be added back into the array and rebuilt. The output shows that the sdb drive has failed by the trailing (F) behind sdb3 and sdb1
There is also a potential that instead of one of the devices in each array being marked as failed, one of the devices in each array may not be listed at all. In such a case, the next step can be skipped.
We must now remove the failed devices from the array. Please be sure to remove only the failed devices with a trailing (F). If instead of the devices being marked as failed, they are already removed from the array, you can skip this step.
Next we will run the fdisk -l to list the partitions of all the drives.
If the failed drive (in this example, sdb) has partitions listed, these partitions must be deleted. If instead of the above, your output does not have any partitions listed but instead has Disk /dev/sdb doesn’t contain a valid partition table you can skip the next step.
To delete the partitions of the failed disk (in this example sdb), we run the fdisk /dev/sdb command. Please make sure to specify the FAILED disk in this command. If at any time you believe you made a mistake, press q and press ENTER to quit with saving changes.
Enter the p command and press ENTER to print the partition tables.
Press the d key and press ENTER to delete a partition and then enter 1 to delete the 1st partition.
Follow the same process for the rest of the partitions.
Issue the p command to print the partition table again and ensure that all partitions have been removed.
Press the w key and press ENTER to write and save the partition table.
We now need to reboot the server to delete the partitions and re-read the partition tables. Use the shutdown -r now command to reboot the server.
Now we will copy the same partition structure from the good drive (sda) to the blank drive (sdb). The command below may potentially wipe the good drive if used incorrectly! Please make sure the first drive specified is the functional drive and the second drive specified is the blank drive.
Now the partition structure of the new drive matches the partition structure of the drive containing data. We will now enable the swap partition.
Run the mkswap command followed by the partition that holds the swap.
Then run the swapon command for the same partition.
Next, we will add the partitions of the new drive to the correct arrays. Once the partitions have been added to the array, the data will be copied over to the new drive, rebuilding the array.
To find out which partitions should be added to which array, use the cat /etc/mdadm.conf command.
Since the sdb drive has been replaced, we will need to add the sdb partitions to the correct arrays. The output from the last step states that /dev/sdb1 should be added to the /dev/md1 array. We will use the command mdadm –manage /dev/md1 –add /dev/sdb1
Next we will check on the RAID status by using the cat /proc/mdstat command again. Since the correct partition has been added back to the correct array, the data should begin copying over to the new drive.
We will now do the same for the second RAID array (md3) by adding the correct sdb partition to the array using the command mdadm –manage /dev/md3 –add /dev/sdb3
Check the RAID array status again using the cat /proc/mdstat command and you should see that the second RAID array resync is DELAYED and will begin as soon as the first array is finished rebuilding.
You will also need to setup the GRUB bootloader on the hard drive that was replaced so if the other drive fails in the future, the server can still boot properly from the new (replaced) drive.
First, check to make sure the the md1 array has finished rebuilding by using the cat /proc/mdstat command once again.
Once you have confirmed that the md1 array is finished and you see the [2/2] [UU] status, you can now run the grub command.
If you were performing the previous steps in rescue mode, please boot the server into the normal mode through the 1&1 Recovery Tool before following the next steps
At the GRUB prompt, issue the following commands in order:
This will set up GRUB on the first hard drive (sda) and the second hard drive (sdb). GRUB uses its own designation hd0 for sda and hd1 for sdb.