ZFS RAIDZ disk change
Here are some notes in order to change a failing disk on a RAIDZ pool.
This has been tested on FreeBSD 11.2. It may work with other versions,
zpool(8) and the handbook to be sure.
My NAS runs FreeBSD 11.2 with zroot, 4x3TB disks in raidz1. Some days ago 1 of those disks started to report quite a few smart errors. ZFS itself did not report any errors, but I prefer to change the disk while it still works. It’s probably faster (copy over re-build) and safer, as one does not face the possibility of a failing disk while rebuilding the RAID.
In this particular case
ada2 was failing, and ada4 was the new disk.
This will change once the failing disk is removed, but I don’t care as I
use gtp labels.
I don’t like GPT GUID labels nor DiskID labels (although I see the point
on this latter ones when you have a bunch of disks …). So, I have this
First thing is to create thg GPT partition table:
gpart create -s GPT ada4
And replicate the same partition scheme on the new disk (in my particular case replacement disk and replaced disk are the same model):
gpart backup ada2 | gpart restore -F ada4
This only replicates the partition scheme, but not the labels. So that has to be done manually:
gpart modify -i 3 -l zfs4 ada4 gpart modify -i 2 -l swap4 ada4 gpart modify -i 1 -l gptboot4 ada4
As you can see on my schema I have a boot partition on each disk, a swap partition an another partition which is part of the zpool.
At this time, we’re ready to replace the disk:
zpool replace zroot gpt/zfs2 gpt/zfs4
This can take a lot of time. It all depends on your hardware. In my case it took over 10h.
Is a good idea to setup now the bootloader in place on the new disk:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada4
Once finished everything is back to normal:
pool: zroot state: ONLINE scan: resilvered 2.15T in 10h23m with 0 errors on Tue Nov 13 04:31:35 2018
NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gpt/zfs0 ONLINE 0 0 0 gpt/zfs1 ONLINE 0 0 0 gpt/zfs4 ONLINE 0 0 0 gpt/zfs3 ONLINE 0 0 0 errors: No known data errors
As a bonus, those commends can help a lot getting information about the disks, partitions and status:
zpool status gpart show gpart backup <provider> camcontrol devlist
Take a look at the respective man pages before executing anything on your machine !