Replace a disk in RAID
Let's say the server has 2 disks: /dev/sda
and /dev/sdb
. These disks are assembled into a software RAID1 using the software utility mdadm --assemble
.
One of the disks has failed, such as this. /dev/sdb
. The damaged disk must be replaced.
Before replacing the disk, it is advisable to unmount.
Remove a disk from the array
Check how the disk in the array is partitioned:
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0] sdb3[1]
975628288 blocks super 1.2 [2/2] [UU]
bitmap: 3/8 pages [12KB], 65536KB chunk
md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]
unused devices: <none>
In this case, the array is assembled such that md0
consists of sda2
и sdb2
, md1
— from sda3
и sdb3
.
On this server. md0
— this /boot
, а md1
— swap and root.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 985M 1 loop
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sda3 8:3 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├─vg0-swap_1 253:0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sdb3 8:19 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├─vg0-swap_1 253:0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /
Delete sdb
of all the devices:
mdadm /dev/md0 --remove /dev/sdb2
mdadm /dev/md1 --remove /dev/sdb3
If partitions are not deleted from the array, then mdadm
does not consider the disk to be faulty and is using it, so removing it will throw an error that the device is in use.
In this case, mark the disk as failed before removing it:
mdadm /dev/md0 -f /dev/sdb2
mdadm /dev/md1 -f /dev/sdb3
Run the commands to remove partitions from the array again.
After removing the failed disk from the array, request a replacement disk: file a ticket with the s/n of the failed disk. Availability of down time depends on the server configuration.
Determine the partition table (GPT or MBR) and transfer it to the new disk
After replacing the damaged disk, add the new disk to the array. To do this, determine the partition table type (GPT or MBR) using gdisk
.
Set gdisk
:
apt-get install gdisk -y
Execute the command:
gdisk -l /dev/sda
Where /dev/sda
— a serviceable disk that is in RAID.
For MBR, the output will be something like the following:
Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present
For GPT is roughly as follows:
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Before adding a disk to the array, create partitions on the disk exactly as you would on a sda
. Depending on the partitioning of the disk, this is done differently.
Copy partitioning for GPT
To copy the GPT partitioning:
sgdisk -R /dev/sdb /dev/sda
Here, the disk to which the partitioning is copied is written first, and the disk from which the partitioning is copied (i.e., from the sda
on sdb
). If you mix them up, the partitioning on an initially serviceable disk will be destroyed.
The second way to copy the markup:
sgdisk --backup=table /dev/sda
sgdisk --load-backup=table /dev/sdb
After copying, assign a new random UUID to the disk:
sgdisk -G /dev/sdb
Copy partitioning for MBR
To copy the MBR partitioning:
sfdisk -d /dev/sda | sfdisk /dev/sdb
Here, the disk from which the partitioning is copied is written first, and the disk to which the partitioning is copied is written second.
If the partitions are not visible in the system, you can reread the partition table with the command:
sfdisk -R /dev/sdb
Add a disk to the array
If on /dev/sdb
partitions are created, you can add a disk to the array:
mdadm /dev/md0 -a /dev/sdb2
mdadm /dev/md1 -a /dev/sdb3
After adding a disk to the array, synchronization should start. The speed depends on the size and type of disk (ssd or hdd):
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[1] sdb3[0]
975628288 blocks super 1.2 [2/1] [U_]
[============>........] recovery = 64.7% (632091968/975628288) finish=41.1min speed=139092K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]
unused devices: <none>
Install the bootloader
After adding a disk to the array, you need to install a bootloader on it.
If the server is loaded in normal mode or in infiltrate-root
it's done by the team:
grub-install /dev/sdb
If the server is booted into Recovery or Rescue (i.e. from a live cd), then to install the bootloader:
-
Mount the root file system in the
/mnt
:mount /dev/md2 /mnt
-
Mount
boot
:mount /dev/md0 /mnt/boot
-
Mount
/dev
,/proc
и/sys
:mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys -
Execute
chroot
to the mounted system:chroot /mnt
-
Set
grub
onsdb
:grub-install /dev/sdb
Then try booting into normal mode.
Replace the disk if it fails
A disk in the array can be conditionally made to fail using the key --fail (-f)
:
mdadm /dev/md0 --fail /dev/sda1
or
mdadm /dev/md0 -f /dev/sda1
A failed disk can be deleted with the key --remove (-r)
:
mdadm /dev/md0 --remove /dev/sda1
or
mdadm /dev/md0 -r /dev/sda1
You can add a new disk to the array using the keys --add (-a)
и --re-add
:
mdadm /dev/md0 --add /dev/sda1
or
mdadm /dev/md0 -a /dev/sda1
Error when restoring bootloader after replacing disk in RAID1
When installing grub
the following error may occur:
root #grub-install --root-directory=/boot /dev/sda
Could not find device for /boot/boot: not found or not a block device
In that case, execute:
root #grep -v rootfs /proc/mounts > /etc/mtab