Skip to main content
Replace a disk in RAID
Last update:

Replace a disk in RAID

Suppose the server has 2 disks: /dev/sda and /dev/sdb. These disks are assembled into a software RAID1 using the mdadm --assemble utility.

One of the disks has failed, for example, it is /dev/sdb. A damaged disk must be replaced.

It is advisable to remove the disk from the array before replacing the disk.

Remove a disk from the array

Check how the disk in the array is partitioned:

cat /proc/mdstat

Personalities : [raid1]
md1 : active raid1 sda3[0] sdb3[1]
975628288 blocks super 1.2 [2/2] [UU]
bitmap: 3/8 pages [12KB], 65536KB chunk

md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]

unused devices: <none>

In this case, the array is assembled so that md0 consists of sda2 and sdb2, md1 consists of sda3 and sdb3.

On this server, md0 is /boot and md1 is swap and root.

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 0 985M 1 loop
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sda3 8:3 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├──vg0-swap_1 253:0 0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sdb3 8:19 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├──vg0-swap_1 253:0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /

Remove sdb from all devices:

mdadm /dev/md0 --remove /dev/sdb2
mdadm /dev/md1 --remove /dev/sdb3

If partitions from the array are not removed, mdadm does not consider the disk to be faulty and is using it, so an error will be displayed that the device is in use when it is removed.

In this case, mark the disk as failed before removing it:

mdadm /dev/md0 -f /dev/sdb2
mdadm /dev/md1 -f /dev/sdb3

Run the commands to remove partitions from the array again.

After removing the failed disk from the array, request a replacement disk by creating a ticket with the s/n of the failed disk. The availability of downtime depends on the server configuration.

Determine the partition table (GPT or MBR) and move it to a new disk

After replacing the damaged disk, you need to add the new disk to the array. To do this, you need to determine the type of partition table: GPT or MBR. The gdisk is used for this purpose.

Install gdisk:

apt-get install gdisk -y

Execute the command:

gdisk -l /dev/sda

Where /dev/sda is a serviceable disk that is in RAID.

For MBR, the output will be something like the following:

Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present

For GPT is roughly as follows:

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Before adding a disk to the array, you need to create partitions on it exactly the same as on sda. Depending on the partitioning of the disk, this is done differently.

Copy partitioning for GPT

To copy the GPT partitioning:

sgdisk -R /dev/sdb /dev/sda

Here, the disk to which the partitioning is copied is written first, and the disk from which it is copied (that is, from sda to sdb) is written second. If you mix them up, the partitioning on an initially serviceable disk will be destroyed.

The second way to copy the markup:

sgdisk --backup=table /dev/sda
sgdisk --load-backup=table /dev/sdb

After copying, assign a new random UUID to the disk:

sgdisk -G /dev/sdb

Copy partitioning for MBR

To copy the MBR partitioning:

sfdisk -d /dev/sda | sfdisk /dev/sdb

Here, the disk from which the partitioning is copied is written first, and the disk to which the partitioning is copied is written second.

If the partitions are not visible in the system, you can reread the partition table with the command:

sfdisk -R /dev/sdb

Add a disk to the array

If partitions have been created on /dev/sdb, you can add the disk to the array:

mdadm /dev/md0 -a /dev/sdb2
mdadm /dev/md1 -a /dev/sdb3

Once the disk is added to the array, synchronization should begin. The speed depends on the size and type of disk (ssd/hdd):

cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[1] sdb3[0]
975628288 blocks super 1.2 [2/1] [U_]
[============>........] recovery = 64.7% (632091968/975628288) finish=41.1min speed=139092K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk

md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Install the bootloader

After adding a disk to the array, you need to install a bootloader on it.

If the server is loaded in normal mode or in infiltrate-root, this is done with a single command:

grub-install /dev/sdb

If the server is booted to Recovery or Rescue (that is. from live cd), then to install the bootloader:

  1. Mount the root file system to /mnt:

    mount /dev/md2 /mnt
  2. Mount boot:

    mount /dev/md0 /mnt/boot
  3. Mount /dev, /proc, and /sys:

    mount --bind /dev /mnt/dev
    mount --bind /proc /mnt/proc
    mount --bind /sys /mnt/sys
  4. Execute chroot to the mounted system:

    chroot /mnt
  5. Set grub to sdb:

    grub-install /dev/sdb

Then try booting into normal mode.

Replace the disk if it fails

A disk in the array can be conditionally made to fail with the --fail (-f) key:

mdadm /dev/md0 --fail /dev/sda1

or

mdadm /dev/md0 -f /dev/sda1

You can remove a failed disk with the --remove (-r) switch:

mdadm /dev/md0 --remove /dev/sda1

or

mdadm /dev/md0 -r /dev/sda1

You can add a new disk to the array by using the --add (-a) and --re-add keys:

mdadm /dev/md0 --add /dev/sda1

or

mdadm /dev/md0 -a /dev/sda1

Error when restoring bootloader after replacing disk in RAID1

The following error may occur when installing grub:

root #grub-install --root-directory=/boot /dev/sda
Could not find device for /boot/boot: not found or not a block device

In that case, execute:

root #grep -v rootfs /proc/mounts > /etc/mtab