ZFS RAIDZ disk change

2018-11-13

Here are some notes in order to change a failing disk on a RAIDZ pool. This has been tested on FreeBSD 11.2. It may work with other versions, but check gpart(8), zpool(8) and the handbook to be sure.

My NAS runs FreeBSD 11.2 with zroot, 4x3TB disks in raidz1. Some days ago 1 of those disks started to report quite a few smart errors. ZFS itself did not report any errors, but I prefer to change the disk while it still works. It’s probably faster (copy over re-build) and safer, as one does not face the possibility of a failing disk while rebuilding the RAID.

In this particular case ada2 was failing, and ada4 was the new disk. This will change once the failing disk is removed, but I don’t care as I use gtp labels.

I don’t like GPT GUID labels nor DiskID labels (although I see the point on this latter ones when you have a bunch of disks …). So, I have this on /boot/loader.conf

kern.geom.label.gptid.enable="0"
kern.geom.label.disk_ident.enable="0"

First thing is to create thg GPT partition table:

gpart create -s GPT ada4

And replicate the same partition scheme on the new disk (in my particular case replacement disk and replaced disk are the same model):

gpart backup ada2 | gpart restore -F ada4

This only replicates the partition scheme, but not the labels. So that has to be done manually:

gpart modify -i 3 -l zfs4 ada4
gpart modify -i 2 -l swap4 ada4
gpart modify -i 1 -l gptboot4 ada4

As you can see on my schema I have a boot partition on each disk, a swap partition an another partition which is part of the zpool.

At this time, we’re ready to replace the disk:

zpool replace zroot gpt/zfs2 gpt/zfs4

This can take a lot of time. It all depends on your hardware. In my case it took over 10h.

Is a good idea to setup now the bootloader in place on the new disk:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada4

Once finished everything is back to normal:

pool: zroot
state: ONLINE
scan: resilvered 2.15T in 10h23m with 0 errors on Tue Nov 13 04:31:35 2018

config:

      NAME          STATE     READ WRITE CKSUM
      zroot         ONLINE       0     0     0
        raidz1-0    ONLINE       0     0     0
          gpt/zfs0  ONLINE       0     0     0
          gpt/zfs1  ONLINE       0     0     0
          gpt/zfs4  ONLINE       0     0     0
          gpt/zfs3  ONLINE       0     0     0

errors: No known data errors

As a bonus, those commends can help a lot getting information about the disks, partitions and status:

zpool status
gpart show
gpart backup <provider>
camcontrol devlist

Take a look at the respective man pages before executing anything on your machine !