Everyone knows that you can replace the drives in a ZFS vdev with larger ones one a time, and when the last one is inserted it automagically uses the extra space, right?
But who’s actually done this? It does actually work, kind of.
However, small scale ZFS users are booting from ZFS, and have been since FreeBSD 10. Simply swapping out the drives with larger ones isn’t going to work. It can’t work. You’ve got boot code, swap files and other stuff to complicate it. But it can be made to work, and here’s how.
The first thing you need to consider is that ZFS is a volume manager, and normally when you create an array (RAIDZ or mirror) it expects to manage the whole disk. When you’re creating a boot environment you need bootstraps to actually boot from it. FreeBSD can do this, and does by default since FreeBSD 10 was released in 2014. The installer handles the tricky stuff about partitioning the disks up and making sure it’ll still boot when one drive is missing.
If you look at the partition table on one of the disks in the array you’ll see something like this:
=> 40 5860533088 ada0 GPT (2.7T)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 4194304 2 freebsd-swap (2.0G)
4196352 5856335872 3 freebsd-zfs (2.7T)
5860532224 904 - free - (452K)
So what’s going on here?
We’re using the modern GPT partitioning scheme. You may as well – go with the flow (but see articles about gmirror). This is a so-called 3Tb SATA disk, but it’s really 2.7Tb as manufacturers don’t know what a Tb really is (2^40 bytes). FreeBSD does know what a Tb, Gb, Mb and Kb is in binary so the numbers you see here won’t always match.
The disk starts with 40 sectors of GPT partition table, followed by the partitions themselves.
The first partition is 512K long and contains the freebsd-boot code. 512K is a lot of boot code, but ZFS is a complicated filing system so it needs quite a lot to be able to read it before the OS kernel is loaded.
The second partition is freebsd-swap. This is just a block of disk space the kernel can use for paging. By labelling it freebsd-swap, FreeBSD can can find it and use it. On an array, each drive has a bit of paging space so the load is shared across all of them. It doesn’t have to be this way, but it’s how the FreeBSD installer does it. If you have an SLOG drive it might make sense to put all the swap on that.
The third partition is actually used for ZFS, and is the bulk of the disk.
You might be wondering what the “- free -” space is all about. For performance reasons its good practice to align partitions to a particular grain size, in this case it appears to be 1Mb. I won’t go into it here, suffice to say that the FreeBSD installer knows what it’s doing, and has left the appropriate gaps.
As I said, ZFS expects to have a whole disk to play with, so normally you’d create an array with something like this:
zpool create mypool raidz1 da0 da1 da2 da3
This creates a RAIDZ1 called mypool out of four drives. But ZFS will also work with geoms (partitions). With the partition scheme show above the creation command would be:
zpool create mypool raidz1 da0p3 da1p3 da2p3 da3p3
ZFS would use partition 3 on all four drives and leave the boot code and swap area alone. And this is effectively what the installer does. da#p2 would be used for swap, and da#p1 would be the boot code – replicated but available on any drive that was still working that the BIOS could find.
So, if we’re going to swap out our small drives with larger ones we’re going to have to sort out the extra complications from being bootable. Fortunately it’s not too hard. But before we start, if you want the pool to expand automatically you need to set an option:
zpool set autoexpand=on zroot
However, you can also expand it manually when you online the new drive using the -e option.
From here I’m going to assume a few things. We have a RAIDZ set up across four drives: da0, da1, da2 and da3. The new drives are larger, and blank (no partition). Sometimes you can get into trouble if they have the wrong stuff in the partition table, so blanking them is best, and if you blank the whole drive you’ll have some confidence it’s a good one. It’s also worth mentioning at some point that you can’t shrink the pool by using smaller drives, so I’ll mention in now. You can only go bigger.
You’ll also have to turn the swap off, as we’ll be pulling swap drives. However, if you’re not using any swap space you should get away with it. Run swapctl -l to see what’s being used, and use swapoff to turn off swapping on any drive we’re about to pull. Also, back up everything to tape or something before messing with any of this, right?
Ready to go? Starting with da0…
zpool offline zroot da0p3
Pull da0 and put the new drive in. It’s worth checking the console to make sure the drive you’ve pulled really is da0, and the new drive is also identified as da0. If you pull the wrong drive, put it back and used “zpool online zroot da0” to put it back. The one you actually pulled will be offline.
We could hand partition it, but it’s easier to simply copy the partition table from one of the other drives:
gpart backup da1 | gpart restore da0
This will copy the wrong partition table over, as all the extra space will be left at the end of the disk. We can fix this:
gpart resize -i 3 da0
When you don’t specify a new size with -s, this will change the third partition to take up all remaining space. There’s no need to leave an alignment gap at the end, but if you want to do the arithmetic you can. Specify the size as the remaining size/2048 to get the number of 512 byte sectors with 1Mb granularity. The only point I can see for doing this is if you’re going to add another partition afterwards and align it, but you’re not.
Next we’ll add the boot code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
And finally put it in the array
zpool replace zroot da0p3
Run zpool status and watch as the array is rebuilt. This may take several hours, or even days.
Once the re-silvering is complete and the array looks good we can do the same with the next drive:
zpool offline zroot da1p3
Swap the old and new disk and wait for it to come online.
gpart backup da0 | gpart restore da1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1
zpool replace zroot da1p3
Wait for resilvering to finish
zpool offline zroot da2p3
Swap the old and new disk and wait for it to come online.
gpart backup da0 | gpart restore da2 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2 zpool replace zroot da2p3
Wait for resilvering to finish
zpool offline zroot da3p3
Swap the old and new disk and wait for it to come online.
gpart backup da0 | gpart restore da3
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3
zpool replace zroot da3p3
Wait for resilvering to finish. Your pool is now expanded!
If you didn’t have autoexpand enabled you’ll need to manually expand them “zpool offline da#” followed by “zpool online -e da#”.