Replacing a drive in a ZFS array in Ubuntu
Sooner or later, every system will have a drive failure.
ZFS was designed with this in mind.
I noticed in my /var/log/kern.log file error messages about one of the drives in my ZFS pool.
Dec 8 07:11:17 nas kernel: [738110.398391] ata8.00: exception Emask 0x0 SAct 0x80000004 SErr 0x0 action 0x0
Dec 8 07:11:17 nas kernel: [738110.398437] ata8.00: irq_stat 0x40000008
Dec 8 07:11:17 nas kernel: [738110.398460] ata8.00: failed command: READ FPDMA QUEUED
Dec 8 07:11:17 nas kernel: [738110.398490] ata8.00: cmd 60/e0:10:00:b9:08/07:00:07:00:00/40 tag 2 ncq dma 1032192 in
Dec 8 07:11:17 nas kernel: [738110.398490] res 41/40:00:90:be:08/00:00:07:00:00/40 Emask 0x409 (media error) <F>
Dec 8 07:11:17 nas kernel: [738110.398564] ata8.00: status: { DRDY ERR }
Dec 8 07:11:17 nas kernel: [738110.398585] ata8.00: error: { UNC }
Considering I've had these drives spinning 24x7 for almost 10 years now, I can't complain that one of them finally started having issues.
Given that all 4 drives in the NAS are coming up on 10 years old, I figure the other 3 will probably start having issues sooner, rather than later.
In addition, I've been looking at upgrading drives to increase the space available on the NAS.
My NAS currently has four 4Tb in a raidz configuration, which gives me 12Tb of space.
I've decided to upgrade the drives, one at a time as my budget allows, to 10Tb drives.
The name of my zpool, the GUID, the controller, and the drive will be different in your system.
Verify and change the commands below to match your system / setup. Do NOT blindly copy and paste these commands!
Basic zpool data
zdb
VD02:
version: 5000
name: 'VD02'
state: 0
txg: 36436836
pool_guid: 15889708516376535445
errata: 0
hostid: 1072610241
hostname: 'nas'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 15889708516376535445
children[0]:
type: 'raidz'
id: 0
guid: 17549503825929563963
nparity: 1
metaslab_array: 41
metaslab_shift: 37
ashift: 12
asize: 15994523222016
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 36
children[0]:
type: 'disk'
id: 0
guid: 8705866138931328345
path: '/dev/sdd2'
phys_path: 'id1,enc@n3061686369656d30/type@0/slot@7/elmdesc@Slot_06/p2'
DTL: 808
create_txg: 4
com.delphix:vdev_zap_leaf: 37
children[1]:
type: 'disk'
id: 1
guid: 1854769443114578821
path: '/dev/sde2'
phys_path: 'id1,enc@n3061686369656d30/type@0/slot@8/elmdesc@Slot_07/p2'
DTL: 807
create_txg: 4
com.delphix:vdev_zap_leaf: 38
children[2]:
type: 'disk'
id: 2
guid: 16276421053278468804
path: '/dev/sdc2'
phys_path: 'id1,enc@n3061686369656d30/type@0/slot@6/elmdesc@Slot_05/p2'
DTL: 806
create_txg: 4
com.delphix:vdev_zap_leaf: 39
children[3]:
type: 'disk'
id: 3
guid: 18196651054978308194
path: '/dev/sdb2'
phys_path: 'id1,enc@n3061686369656d30/type@0/slot@5/elmdesc@Slot_04/p2'
DTL: 805
create_txg: 4
com.delphix:vdev_zap_leaf: 40
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
zpool status VD02
pool: VD02
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 0B in 0 days 11:18:43 with 0 errors on Thu Dec 8 01:11:22 2023
config:
NAME STATE READ WRITE CKSUM
VD02 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
sde2 ONLINE 0 0 7
sdc2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
Basic drive information
To list the disks by their path:
ll /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root 9 Dec 8 12:46 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 Dec 8 12:46 pci-0000:00:17.0-ata-5 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-5-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-5-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 9 Dec 8 12:46 pci-0000:00:17.0-ata-6 -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-6-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 9 Dec 8 12:46 pci-0000:00:17.0-ata-7 -> ../../sdd
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-7-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-7-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 9 Dec 8 12:46 pci-0000:00:17.0-ata-8 -> ../../sde
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-8-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-8-part2 -> ../../sde2
The above command tells us that the drive on ata-8 is sde2.
Now we can get the serial number of the drive in question:
smartctl -a /dev/sde
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-137-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E4PS1KAU
...
Before I go in and start mucking with the system, I want to ensure that I'm replacing the correct drive, so I've reviewed the following:
-
The output of /var/log/kern.log reports that ata8.00 is showing errors.
-
The command
zpool status
shows that sde2 has checksum errors. -
The command
zdb
shows that sde2 has a GUID of 1854769443114578821. -
The command
ls -l /dev/disk/by-path
shows that ata-8 is sde -
The command
smartctl -a /dev/sde
shows this drive has a serial number of WD-WCC4E4PS1KAU
Check installed drives
lsblk | grep -w "sd."
sda 8:0 0 111.8G 0 disk
sdb 8:16 0 3.7T 0 disk
sdc 8:32 0 3.7T 0 disk
sdd 8:48 0 3.7T 0 disk
sde 8:64 0 3.7T 0 disk
The
grep -w "sd."
matches the whole word (-w) and and anything that starts with sd and contains exactly one additional character ("sd.").
Remove the faulty drive from the zpool
zpool offline VD02 /dev/sde2
If the drive it completely dead, you would need to specify the guid (gathered from the zdb
command above) instead:
zpool offline VD02 1854769443114578821
Check to make sure the drive shows as OFFLINE:
zpool status VD02 -v
pool: VD02
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 0 days 11:18:43 with 0 errors on Thu Dec 8 01:11:22 2023
config:
NAME STATE READ WRITE CKSUM
VD02 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdd2 ONLINE 0 0 0
sde2 OFFLINE 0 0 0
sdc2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
Remove the drive (sde in this example) from the subsystem:
echo 1 > /sys/block/sde/device/delete
If you're using sudo
to run this command, you would run this instead:
sudo sh -c "echo 1 > /sys/block/sde/device/delete"
Add the new drive to the zpool
After swapping the drives, verify it shows up in the system:
lsblk | grep -w "sd."
sda 8:0 0 111.8G 0 disk
sdb 8:16 0 3.7T 0 disk
sdc 8:32 0 3.7T 0 disk
sdd 8:48 0 3.7T 0 disk
sde 8:64 0 9.1T 0 disk
Add the new disk to the zpool
For zpools of less than 10 disks, the recommendation is to use /dev/disk/by-id/ and not /dev/sdX.
Given that recommendation, I need to get a list of the drives:
ls -la /dev/disk/by-id/ | grep "ata.*sd[a-z]$"
lrwxrwxrwx 1 root root 9 Dec 8 06:48 ata-SanDisk_SDSSDA120G_170260449110 -> ../../sde
lrwxrwxrwx 1 root root 9 Dec 8 06:48 ata-WDC_WD102KFBX-68M95N0_VCKWASHP -> ../../sda
lrwxrwxrwx 1 root root 9 Dec 8 06:48 ata-WDC_WD102KFBX-68M95N0_VHGA3N0M -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 8 06:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0AR83LS -> ../../sdc
lrwxrwxrwx 1 root root 9 Dec 8 06:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2JHNVN3 -> ../../sdd
The
grep "ata.*sd[a-z]$"
searches for anything with ata <anything in between> sd <ending with letters a-z>, so we don't see partitions, such as sda1, etc.
The correct format is zpool replace <zpool name> <old drive> <new drive>
zpool replace VD02 sde2 ata-WDC_WD102KFBX-68M95N0_VCKWASHP
Verify the new drive is in the zpool
zpool status VD02 -v
pool: VD02
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Dec 8 18:16:09 2023
4.22T scanned at 25.7G/s, 7.67G issued at 46.7M/s, 12.1T total
1.75G resilvered, 0.06% done, 3 days 03:16:34 to go
config:
NAME STATE READ WRITE CKSUM
VD02 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-1 DEGRADED 0 0 0
sde2 OFFLINE 0 0 0
ata-WDC_WD102KFBX-68M95N0_VCKWASHP ONLINE 0 0 0 (resilvering)
sdbe ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
errors: No known data errors
It did not take 3 days to resilver!
Depending on how much you have stored in your zpool, the resilvering may take a while, but you can check the status by running zpool status VD02 -v
In my case, after about 12 hours:
zpool status VD02 -v
pool: VD02
state: ONLINE
scan: resilvered 2.89T in 0 days 12:05:18 with 0 errors on Fri Dec 9 06:21:27 2023
config:
NAME STATE READ WRITE CKSUM
VD02 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
sde ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
Until I replace all of the drives in the ZFS array with 10Tb drives, I won't get more than 4Tb out of this new 10Tb drive, but that's expected.
References
zfsonlinux.org - Article for Message ID: ZFS-8000-9P - Failing device in replicated configuration https://zfsonlinux.org/msg/ZFS-8000-9P/
OpenZFS - FAQ - Selecting /dev/ names when creating a pool (Linux) https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#selecting-dev-names-when-creating-a-pool-linux
Red Hat Customer Portal - Online Storage Reconfiguration Guide > Removing a Storage Device https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/removing_devices