Everyone knows that you can replace the drives in a ZFS vdev with
larger ones one a time, and when the last one is inserted it
automagically uses the extra space, right?
But who’s actually
done this? It does actually work, kind of.
However, small scale
ZFS users are booting from ZFS, and have been since FreeBSD 10.
Simply swapping out the drives with larger ones isn’t going to
work. It can’t work. You’ve got boot code, swap files and other
stuff to complicate it. But it can be made to work, and here’s how.
The first thing you
need to consider is that ZFS is a volume manager, and normally when
you create an array (RAIDZ or mirror) it expects to manage the whole
disk. When you’re creating a boot environment you need bootstraps
to actually boot from it. FreeBSD can do this, and does by default
since FreeBSD 10 was released in 2014. The installer handles the
tricky stuff about partitioning the disks up and making sure it’ll
still boot when one drive is missing.
If you look at the
partition table on one of the disks in the array you’ll see
something like this:
=> 40 5860533088 ada0 GPT (2.7T)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 4194304 2 freebsd-swap (2.0G)
4196352 5856335872 3 freebsd-zfs (2.7T)
5860532224 904 - free - (452K)
So what’s going on
here?
We’re using the
modern GPT partitioning scheme. You may as well – go with the flow
(but see articles about gmirror). This is a so-called 3Tb SATA disk,
but it’s really 2.7Tb as manufacturers don’t know what a Tb
really is (2^40 bytes). FreeBSD does know what a Tb, Gb, Mb and Kb is
in binary so the numbers you see here won’t always match.
The disk starts with
40 sectors of GPT partition table, followed by the partitions
themselves.
The first partition
is 512K long and contains the freebsd-boot code. 512K is a lot of
boot code, but ZFS is a complicated filing system so it needs quite a
lot to be able to read it before the OS kernel is loaded.
The second partition
is freebsd-swap. This is just a block of disk space the kernel can
use for paging. By labelling it freebsd-swap, FreeBSD can can find it
and use it. On an array, each drive has a bit of paging space so the
load is shared across all of them. It doesn’t have to be this way,
but it’s how the FreeBSD installer does it. If you have an SLOG
drive it might make sense to put all the swap on that.
The third partition
is actually used for ZFS, and is the bulk of the disk.
You might be
wondering what the “- free -” space is all about. For performance
reasons its good practice to align partitions to a particular grain
size, in this case it appears to be 1Mb. I won’t go into it here,
suffice to say that the FreeBSD installer knows what it’s doing,
and has left the appropriate gaps.
As I said, ZFS
expects to have a whole disk to play with, so normally you’d create
an array with something like this:
zpool create mypool
raidz1 da0 da1 da2 da3
This creates a
RAIDZ1 called mypool out of four drives. But ZFS will also work with
geoms (partitions). With the partition scheme show above the creation
command would be:
zpool create mypool
raidz1 da0p3 da1p3 da2p3 da3p3
ZFS would use
partition 3 on all four drives and leave the boot code and swap area
alone. And this is effectively what the installer does. da#p2 would
be used for swap, and da#p1 would be the boot code – replicated but
available on any drive that was still working that the BIOS could
find.
So, if we’re going
to swap out our small drives with larger ones we’re going to have
to sort out the extra complications from being bootable. Fortunately
it’s not too hard. But before we start, if you want the pool to
expand automatically you need to set an option:
zpool set autoexpand=on zroot
However, you can also expand it manually when you online the new drive using the -e option.
From here I’m
going to assume a few things. We have a RAIDZ set up across four
drives: da0, da1, da2 and da3. The new drives are larger, and blank
(no partition). Sometimes you can get into trouble if they have the
wrong stuff in the partition table, so blanking them is best, and if
you blank the whole drive you’ll have some confidence it’s a
good one. It’s also worth mentioning at some point that you can’t
shrink the pool by using smaller drives, so I’ll mention in now.
You can only go bigger.
You’ll also have
to turn the swap off, as we’ll be pulling swap drives. However, if
you’re not using any swap space you should get away with it. Run
swapctl -l to see what’s being used, and use swapoff to turn off
swapping on any drive we’re about to pull. Also, back up everything
to tape or something before messing with any of this, right?
Ready to go?
Starting with da0…
zpool offline zroot da0p3
Pull da0 and put the
new drive in. It’s worth checking the console to make sure the
drive you’ve pulled really is da0, and the new drive is also
identified as da0. If you pull the wrong drive, put it back and used
“zpool online zroot da0” to put it back. The one you actually
pulled will be offline.
We could hand
partition it, but it’s easier to simply copy the partition table
from one of the other drives:
gpart backup da1 | gpart restore da0
This will copy the
wrong partition table over, as all the extra space will be left at
the end of the disk. We can fix this:
gpart resize -i 3 da0
When you don’t specify a new size with -s, this will change the third partition to take up all remaining space. There’s no need to leave an alignment gap at the end, but if you want to do the arithmetic you can. Specify the size as the remaining size/2048 to get the number of 512 byte sectors with 1Mb granularity. The only point I can see for doing this is if you’re going to add another partition afterwards and align it, but you’re not.
Next we’ll add the
boot code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
And finally put it
in the array
zpool replace zroot da0p3
Run zpool status and
watch as the array is rebuilt. This may take several hours, or even
days.
Once the
re-silvering is complete and the array looks good we can do the same
with the next drive:
zpool offline zroot da1p3
Swap the old and new
disk and wait for it to come online.
gpart backup da0 | gpart restore da1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1
zpool replace zroot da1p3
Wait for resilvering
to finish
zpool offline zroot da2p3
Swap the old and new
disk and wait for it to come online.
gpart backup da0 | gpart restore da2
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2
zpool replace zroot da2p3
Wait for resilvering
to finish
zpool offline zroot da3p3
Swap the old and new
disk and wait for it to come online.
gpart backup da0 | gpart restore da3
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3
zpool replace zroot da3p3
Wait for resilvering to finish. Your pool is now expanded!
If you didn’t have autoexpand enabled you’ll need to manually expand them “zpool offline da#” followed by “zpool online -e da#”.