Rebuilding Software Raid
Rebuilding an array currently requires us to manually rebuild the partition table on the fresh hard drive. We need to know which drive is still active and which one is the new one. To see this let's run the following command:
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[0] 104320 blocks [2/1] [U_] md1 : active raid1 sdb2[0] 1052160 blocks [2/1] [U_] md2 : active raid1 sdb3[0] 243039232 blocks [2/1] [U_] unused devices: <none> |
To read this look at the first managed disk, md0. If you are adding back in a partition on a drive that is not empty you may have to keep track of which drive different ones are on. For the current purposes we will assume that we are installing a fresh, unpartitioned drive.
We need to see what the current drive is partitioned as, so we can dulpicate the same partition table on the new drive:
# fdisk -l /dev/sdb Disk /dev/sdb: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 13 104391 fd Linux raid autodetect /dev/sdb2 14 144 1052257+ fd Linux raid autodetect /dev/sdb3 145 30401 243039352+ fd Linux raid autodetect |
... and just to verify that /dev/sda is blank ...
# fdisk -l /dev/sda Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System |
Now we need to edit the partition table on /dev/sda to match exactly what we see on /dev/sdb.
# fdisk /dev/sda The number of cylinders for this disk is set to 30401. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 |
For the cylinder start/stop values just refer to the existing partition table. It says "Start" and "End" values for each partition. If you just copy these exactly as the fdisk -l for that drive outputs it will create them exactly the same for you.
First cylinder (1-30401, default 1): 1 Last cylinder or +size or +sizeM or +sizeK (1-30401, default 30401): 13 Command (m for help): p Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 13 104391 83 Linux Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (14-30401, default 14): 14 Last cylinder or +size or +sizeM or +sizeK (14-30401, default 30401): 144 Command (m for help): p Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 13 104391 83 Linux /dev/sda2 14 144 1052257+ 83 Linux Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 3 First cylinder (145-30401, default 145): 145 Last cylinder or +size or +sizeM or +sizeK (145-30401, default 30401): 30401 |
We need to set the boot partition as bootable or this drive won't be very useful if the other dies
Command (m for help): a Partition number (1-4): 1 |
Now we need to set the partition type to 'fd' for all of the partitions which is hex for a linux raid partition.
Command (m for help): t Partition number (1-4): 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): fd Changed system type of partition 2 to fd (Linux raid autodetect) Command (m for help): t Partition number (1-4): 3 Hex code (type L to list codes): fd Changed system type of partition 3 to fd (Linux raid autodetect) |
Let's look at the partition table we created, it should be identical to the one above from the existing hard drive.
Command (m for help): p Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 fd Linux raid autodetect /dev/sda2 14 144 1052257+ fd Linux raid autodetect /dev/sda3 145 30401 243039352+ fd Linux raid autodetect |
If it looks good then we use w to write to the disk and exit
Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. |
I usually run 'cat /proc/mdstat' again so I can see the partitions and compare as I add the newly created partitions back into the raid array.
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[0] 104320 blocks [2/1] [U_] md1 : active raid1 sdb2[0] 1052160 blocks [2/1] [U_] md2 : active raid1 sdb3[0] 243039232 blocks [2/1] [U_] unused devices: <none> |
Now we have to add in each of the partitions back into the managed disks, one at a time. I run 'cat /proc/mdstat' again after each addition to make sure it worked.
# mdadm /dev/md0 --add /dev/sda1 mdadm: added /dev/sda1 # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[1] sdb1[0] 104320 blocks [2/2] [UU] md1 : active raid1 sdb2[0] 1052160 blocks [2/1] [U_] md2 : active raid1 sdb3[0] 243039232 blocks [2/1] [U_] unused devices: <none> # mdadm /dev/md1 --add /dev/sda2 mdadm: added /dev/sda2 # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[1] sdb1[0] 104320 blocks [2/2] [UU] md1 : active raid1 sda2[2] sdb2[0] 1052160 blocks [2/1] [U_] [========>............] recovery = 43.5% (458752/1052160) finish=0.1min speed=76458K/sec md2 : active raid1 sdb3[0] 243039232 blocks [2/1] [U_] unused devices: <none> # mdadm /dev/md2 --add /dev/sda3 mdadm: added /dev/sda3 # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[1] sdb1[0] 104320 blocks [2/2] [UU] md1 : active raid1 sda2[1] sdb2[0] 1052160 blocks [2/2] [UU] md2 : active raid1 sda3[2] sdb3[0] 243039232 blocks [2/1] [U_] [>....................] recovery = 0.1% (308480/243039232) finish=78.6min speed=51413K/sec unused devices: <none> |
The last step is to update/refresh the grub configuration on both drives. These steps need to be taken on the main drive (eg. /dev/sda not /dev/sda1) of each member of the RAID array:
# /sbin/grub grub> device (hd0) /dev/sda grub> root (hd0,0) grub> setup (hd0) grub> device (hd0) /dev/sdb grub> root (hd0,0) grub> setup (hd0) grub> device (hd0) /dev/sdX grub> root (hd0,0) grub> setup (hd0) |
NOTE: the device name changes, but the grub values (hd0) do not. This ensures that the drives are detected properly in a failure situation.
And that is it. We are rebuilding as you can see. In this case it should take almost an hour and a half to rebuild the largest partition, with the smaller ones done almost as fast as you can type the commands.