Replacing a drive in a ZFS array in Ubuntu

Sooner or later, every system will have a drive failure.

ZFS was designed with this in mind.

I noticed in my /var/log/kern.log file error messages about one of the drives in my ZFS pool.

Dec  8 07:11:17 nas kernel: [738110.398391] ata8.00: exception Emask 0x0 SAct 0x80000004 SErr 0x0 action 0x0
Dec  8 07:11:17 nas kernel: [738110.398437] ata8.00: irq_stat 0x40000008
Dec  8 07:11:17 nas kernel: [738110.398460] ata8.00: failed command: READ FPDMA QUEUED
Dec  8 07:11:17 nas kernel: [738110.398490] ata8.00: cmd 60/e0:10:00:b9:08/07:00:07:00:00/40 tag 2 ncq dma 1032192 in
Dec  8 07:11:17 nas kernel: [738110.398490]          res 41/40:00:90:be:08/00:00:07:00:00/40 Emask 0x409 (media error) <F>
Dec  8 07:11:17 nas kernel: [738110.398564] ata8.00: status: { DRDY ERR }
Dec  8 07:11:17 nas kernel: [738110.398585] ata8.00: error: { UNC }

Considering I've had these drives spinning 24x7 for almost 10 years now, I can't complain that one of them finally started having issues.

Given that all 4 drives in the NAS are coming up on 10 years old, I figure the other 3 will probably start having issues sooner, rather than later.

In addition, I've been looking at upgrading drives to increase the space available on the NAS.

My NAS currently has four 4Tb in a raidz configuration, which gives me 12Tb of space.

I've decided to upgrade the drives, one at a time as my budget allows, to 10Tb drives.

The name of my zpool, the GUID, the controller, and the drive will be different in your system.

Verify and change the commands below to match your system / setup. Do NOT blindly copy and paste these commands!

Basic zpool data

zdb
VD02:
   version: 5000
   name: 'VD02'
   state: 0
   txg: 36436836
   pool_guid: 15889708516376535445
   errata: 0
   hostid: 1072610241
   hostname: 'nas'
   com.delphix:has_per_vdev_zaps
   vdev_children: 1
   vdev_tree:
       type: 'root'
       id: 0
       guid: 15889708516376535445
       children[0]:
           type: 'raidz'
           id: 0
           guid: 17549503825929563963
           nparity: 1
           metaslab_array: 41
           metaslab_shift: 37
           ashift: 12
           asize: 15994523222016
           is_log: 0
           create_txg: 4
           com.delphix:vdev_zap_top: 36
           children[0]:
               type: 'disk'
               id: 0
               guid: 8705866138931328345
               path: '/dev/sdd2'
               phys_path: 'id1,enc@n3061686369656d30/type@0/slot@7/elmdesc@Slot_06/p2'
               DTL: 808
               create_txg: 4
               com.delphix:vdev_zap_leaf: 37
           children[1]:
               type: 'disk'
               id: 1
               guid: 1854769443114578821
               path: '/dev/sde2'
               phys_path: 'id1,enc@n3061686369656d30/type@0/slot@8/elmdesc@Slot_07/p2'
               DTL: 807
               create_txg: 4
               com.delphix:vdev_zap_leaf: 38
           children[2]:
               type: 'disk'
               id: 2
               guid: 16276421053278468804
               path: '/dev/sdc2'
               phys_path: 'id1,enc@n3061686369656d30/type@0/slot@6/elmdesc@Slot_05/p2'
               DTL: 806
               create_txg: 4
               com.delphix:vdev_zap_leaf: 39
           children[3]:
               type: 'disk'
               id: 3
               guid: 18196651054978308194
               path: '/dev/sdb2'
               phys_path: 'id1,enc@n3061686369656d30/type@0/slot@5/elmdesc@Slot_04/p2'
               DTL: 805
               create_txg: 4
               com.delphix:vdev_zap_leaf: 40
   features_for_read:
       com.delphix:hole_birth
       com.delphix:embedded_data
zpool status VD02
  pool: VD02
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 0 days 11:18:43 with 0 errors on Thu Dec 8 01:11:22 2023
config:

	NAME        STATE     READ WRITE CKSUM
	VD02        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdd2    ONLINE       0     0     0
	    sde2    ONLINE       0     0     7
	    sdc2    ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0

errors: No known data errors

Basic drive information

To list the disks by their path:

ll /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 Dec 8 12:46 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Dec 8 12:46 pci-0000:00:17.0-ata-5 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-5-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-5-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 Dec 8 12:46 pci-0000:00:17.0-ata-6 -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-6-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  9 Dec 8 12:46 pci-0000:00:17.0-ata-7 -> ../../sdd
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-7-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-7-part2 -> ../../sdd2
lrwxrwxrwx 1 root root  9 Dec 8 12:46 pci-0000:00:17.0-ata-8 -> ../../sde
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-8-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Dec 8 12:46 pci-0000:00:17.0-ata-8-part2 -> ../../sde2

The above command tells us that the drive on ata-8 is sde2.

Now we can get the serial number of the drive in question:

smartctl -a /dev/sde
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-137-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68WT0N0
Serial Number:    WD-WCC4E4PS1KAU
...

Before I go in and start mucking with the system, I want to ensure that I'm replacing the correct drive, so I've reviewed the following:

  • The output of /var/log/kern.log reports that ata8.00 is showing errors.

  • The command zpool status shows that sde2 has checksum errors.

  • The command zdb shows that sde2 has a GUID of 1854769443114578821.

  • The command ls -l /dev/disk/by-path shows that ata-8 is sde

  • The command smartctl -a /dev/sde shows this drive has a serial number of WD-WCC4E4PS1KAU

Check installed drives

lsblk | grep -w "sd."
sda                         8:0    0 111.8G  0 disk
sdb                         8:16   0   3.7T  0 disk
sdc                         8:32   0   3.7T  0 disk
sdd                         8:48   0   3.7T  0 disk
sde                         8:64   0   3.7T  0 disk

The grep -w "sd." matches the whole word (-w) and and anything that starts with sd and contains exactly one additional character ("sd.").

Remove the faulty drive from the zpool

zpool offline VD02 /dev/sde2

If the drive it completely dead, you would need to specify the guid (gathered from the zdb command above) instead:

zpool offline VD02 1854769443114578821

Check to make sure the drive shows as OFFLINE:

zpool status VD02 -v
  pool: VD02
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0B in 0 days 11:18:43 with 0 errors on Thu Dec 8 01:11:22 2023
config:

	NAME        STATE     READ WRITE CKSUM
	VD02        DEGRADED     0     0     0
	  raidz1-0  DEGRADED     0     0     0
	    sdd2    ONLINE       0     0     0
	    sde2    OFFLINE      0     0     0
	    sdc2    ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0

errors: No known data errors

Remove the drive (sde in this example) from the subsystem:

echo 1 > /sys/block/sde/device/delete

If you're using sudo to run this command, you would run this instead:

sudo sh -c "echo 1 > /sys/block/sde/device/delete"

Add the new drive to the zpool

After swapping the drives, verify it shows up in the system:

lsblk | grep -w "sd."
sda                         8:0    0 111.8G  0 disk
sdb                         8:16   0   3.7T  0 disk
sdc                         8:32   0   3.7T  0 disk
sdd                         8:48   0   3.7T  0 disk
sde                         8:64   0   9.1T  0 disk

Add the new disk to the zpool

For zpools of less than 10 disks, the recommendation is to use /dev/disk/by-id/ and not /dev/sdX.

Given that recommendation, I need to get a list of the drives:

ls -la /dev/disk/by-id/ | grep "ata.*sd[a-z]$"
lrwxrwxrwx 1 root root    9 Dec  8 06:48 ata-SanDisk_SDSSDA120G_170260449110 -> ../../sde
lrwxrwxrwx 1 root root    9 Dec  8 06:48 ata-WDC_WD102KFBX-68M95N0_VCKWASHP -> ../../sda
lrwxrwxrwx 1 root root    9 Dec  8 06:48 ata-WDC_WD102KFBX-68M95N0_VHGA3N0M -> ../../sdb
lrwxrwxrwx 1 root root    9 Dec  8 06:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0AR83LS -> ../../sdc
lrwxrwxrwx 1 root root    9 Dec  8 06:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2JHNVN3 -> ../../sdd

The grep "ata.*sd[a-z]$" searches for anything with ata <anything in between> sd <ending with letters a-z>, so we don't see partitions, such as sda1, etc.

The correct format is zpool replace <zpool name> <old drive> <new drive>

zpool replace VD02 sde2 ata-WDC_WD102KFBX-68M95N0_VCKWASHP

Verify the new drive is in the zpool

zpool status VD02 -v
  pool: VD02
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Dec 8 18:16:09 2023
	4.22T scanned at 25.7G/s, 7.67G issued at 46.7M/s, 12.1T total
	1.75G resilvered, 0.06% done, 3 days 03:16:34 to go
config:

	NAME                                      STATE     READ WRITE CKSUM
	VD02                                      DEGRADED     0     0     0
	  raidz1-0                                DEGRADED     0     0     0
	    replacing-1                           DEGRADED     0     0     0
	      sde2                                OFFLINE      0     0     0  
	      ata-WDC_WD102KFBX-68M95N0_VCKWASHP  ONLINE       0     0     0  (resilvering)
	    sdbe                                  ONLINE       0     0     0
	    sdc2                                  ONLINE       0     0     0
	    sdd2                                  ONLINE       0     0     0

errors: No known data errors

It did not take 3 days to resilver!

Depending on how much you have stored in your zpool, the resilvering may take a while, but you can check the status by running zpool status VD02 -v

In my case, after about 12 hours:

zpool status VD02 -v
  pool: VD02
 state: ONLINE
  scan: resilvered 2.89T in 0 days 12:05:18 with 0 errors on Fri Dec 9 06:21:27 2023
config:

	NAME        STATE     READ WRITE CKSUM
	VD02        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdd2    ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdc2    ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0

errors: No known data errors

Until I replace all of the drives in the ZFS array with 10Tb drives, I won't get more than 4Tb out of this new 10Tb drive, but that's expected.

References

zfsonlinux.org - Article for Message ID: ZFS-8000-9P - Failing device in replicated configuration https://zfsonlinux.org/msg/ZFS-8000-9P/

OpenZFS - FAQ - Selecting /dev/ names when creating a pool (Linux) https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#selecting-dev-names-when-creating-a-pool-linux

Red Hat Customer Portal - Online Storage Reconfiguration Guide > Removing a Storage Device https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/removing_devices