Diagnose and replace the defective disk
You can check the status of the disk using SMART (Self-Monitoring, Analysis and Reporting Technology) attributes.If the test results show that the disk is faulty, you can replace the faulty disk.
Check disk condition
1. Get SMART attributes
The method of obtaining SMART attributes depends on the operating system installed on the server and the way the disk is connected to the server:
- without RAID controller - the disk is connected directly to the motherboard or through an HBA controller;
- via RAID controller - the disk is connected via an Adaptec or MegaRAID controller installed on the server.
Linux
Windows
Without RAID controller
Adaptec
MegaRAID
-
Install the
smartmontoolspackage, which is a set of utilities for monitoring the health of SMART-enabled HDDs and SSDs.apt-get install smartmontools -
Display information about the disks:
lsblkDisk information will appear in the response. Memorize or copy the disk names. For example:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTSsda 8:0 0 1.8T 0 disk└─sda1 8:1 0 1.8T 0 part /mnt/datasdb 8:16 0 931.5G 0 disk└─sdb1 8:17 0 931.5G 0 part /mnt/backupnvme0n1 259:0 0 465.8G 0 disk├─nvme0n1p1 259:1 0 512M 0 part /boot/efi├─nvme0n1p2 259:2 0 16G 0 part [SWAP]└─nvme0n1p3 259:3 0 449.3G 0 part /Here
sda,sdb,nvme0n1are the disk names. -
Start reading SMART attributes. The command to run depends on the disk interface:
- for SATA:
smartctl -iA /dev/<disk_name>Specify
<disk_name>is the disk name you copied in step 3.- for NVME:
nvme smart-log /dev/<disk_name>Specify
<disk_name>is the disk name you copied in step 3.
2. Assess SMART attributes
A disk is considered faulty if at least one of the SMART attributes fits the specified conditions.
HDD disks
SSD drives
NVME drives
Replace a defective disk
You can determine if a disk is faulty by checking the status of the disk.If the SMART attribute assessment results in a faulty disk, you can initiate a replacement.To do so:
- Get the serial number of the faulty disk.
- Coordinate the replacement of the disk.
- If the disk is added to a RAID array, remove the disk from the RAID array.
- Light up the disk.
- Check the disk in the system.
- If the disk was in a RAID array, add the disk to the RAID array.
1. Get the serial number of the defective disk
Linux
Windows
Without RAID controller
Adaptec
MegaRAID
-
Get the serial number of the faulty disk, to do this, print the disk information:
lsblk -o name,serial,modelDisk information will appear in the response. Copy the serial number of the failed disk. For example:
NAME SERIAL MODELsdb S0H0N0XYZ123456 Samsung SSD 970 EVO Plus 500GBnvme0n1 S0D0NX0M001234 Samsung SSD 980 PRO 1TBHere
SERIALis the serial number of the disk.
2. Coordinate disk replacement
-
Create a ticket. In the ticket specify:
-
If a disk replacement is agreed upon, a Selectel employee will specify a convenient time and duration of the work for you. The duration of the work will be required to determine when the disk will be illuminated.
3. Remove a disk from the RAID array
If the disk is in a RAID array, remove the disk from the array.
4. Illuminate the disk
At the time scheduled for the work, we will notify you in a ticket that we are ready to proceed with the disk replacement.
If the disk fails to illuminate and engineers cannot identify it by serial number, we will need to shut down the server to replace the disk.In this case, we will report the problem when identifying the disk and agree on a time to shut down the server in the ticket.
Linux
Windows
Without RAID controller
Adaptec
MegaRAID
To light a disk, put a load on the disk, such as running a write or read operation.If you eject the disk while these operations are in progress, there will be read errors.This is normal behavior because the command is trying to access data on a disk that has already been ejected.
-
Display information about the disks:
lsblkDisk information will appear in the response. Memorize or copy the disk name. For example:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTSsda 8:0 0 1.8T 0 disk└─sda1 8:1 0 1.8T 0 part /mnt/datasdb 8:16 0 931.5G 0 disk└─sdb1 8:17 0 931.5G 0 part /mnt/backupnvme0n1 259:0 0 465.8G 0 disk├─nvme0n1p1 259:1 0 512M 0 part /boot/efi├─nvme0n1p2 259:2 0 16G 0 part [SWAP]└─nvme0n1p3 259:3 0 449.3G 0 part /Here
sda,sdb,nvme0n1are the disk names. -
Light up the disk:
dd if=/dev/<disk_name> of=/dev/nullSpecify
<disk_name>is the disk name you copied in step 2.
5. Check the disk in the system
Linux
Windows
Without RAID controller
Adaptec
MegaRAID
-
Wait on the ticket for a message from a Selectel employee stating that the disk has been replaced.
-
Verify that the drive has initialized to the system:
lsblk -
If the disk is not in the list, reboot the server. If after the reboot the disk is not initialized in the system, report it in the ticket.
6. Add a disk to a RAID array
If the disk was in a RAID array, add the replaced disk to the array.