ZFS is not always the answer. Bring back gmirror!

The ZFS bandwaggon has momentum, but ZFS isn’t for everyone. UFS2 has a number of killer advantages in some applications.

ZFS is great if you want to store a very large number of normal files safely. It’s copy-on-write (COW) is a major advantage for backup, archiving and general data safety, and datasets allow you to fine-tune almost any way you can think of. However, in a few circumstances, UFS2 is better. In particular, large random-access files do badly with COW.

Unlike traditional systems, a block in a file isn’t overwritten in place, it always ends up at a different location. If a file started off contiguous it’ll pretty soon be fragmented to hell and performance will go off a cliff. Obvious victims will be databases and VM hard disk images. You can tune for these, but to get acceptable performance you need to throw money and resources to bring ZFS up to the same level. Basically you need huge RAM caches, possibly an SLOG, and never let your pool get more than 50% full. If you’re unlucky enough to end up at 80% full ZFS turns off speed optimisations to devote more RAM to caching as things are going to get very bad fragmentation-wise.

If these costs are a problem, stuck with UFS. And for redundancy, there is still good old GEOM Mirror (gmirror). Unfortunately the documentation of this now-poor relation has lagged a bit, and what once worked as standard, doesn’t. So here are some tips.

The most common use of gmirror (with me anyway) is a twin-drive host. Basically I don’t want things to fail when a hard disk dies, so I add a second redundant drive. Such hosts (often 1U servers) don’t have space for more than two drives anyway – and it pays to keep things simple.

Setting up a gmirror is really simple. You create one using the “gmirror label” command. There is no “gmirror create” command; it really is called “label”, and it writes the necessary metadata label so that mirror will recognise it (“gmirror destroy” is present and does exactly what you might expect).

So something like:

gmirror label gm0 ada1 ada2

will create a device called /dev/mirror/gm0 and it’ll contain ada1’s contents mirrored on to ada2 (once it’s copied it all in the background). Just use /dev/mirror/gm0 as any other GEOM (i.e. disk). Instead of calling it gm0 I could have called it gm1, system, data, flubnutz or anything else that made sense, but gm0 is a handy reminder that it’s the first geom mirror on the system and it’s shorter to type.

The eagle eyed might have noticed I used ada1 and ada2 above. You’ve booted off ada0, right? So what happens if you try mirroring yourself with “gmirror label gm0 ada0 ada1“? Well this used to work, but in my experience it doesn’t any more. And on a twin-drive system, this is exactly what you want to do. But it is still possible, read on…

How to set up a twin-drive host booting from a geom mirror

First off, before you do anything (even installing FreeBSD) you need to set up your disks. Since the IBM XT, hard disks have been partitioned using an MBR (Master Boot Record) at the start. This is really old, naff, clunky and Microsoft. Those in the know have been using the far superior GPT system for ages, and it’s pretty cross-platform now. However, it doesn’t play nice with gmirror, so we’re going to use MBR instead. Trust me on this.

For the curious, know that GPT keeps a copy of the partition table at the beginning and end of the disk, but MBR only has one, stored at the front. gmirror keeps its metadata at the end of the disk, well away from the MBR but unfortunately in exactly the same spot as the spare GPT. You can hack the gmirror code so it doesn’t do this, or frig around with mirroring geoms rather than whole disks and somehow get it to boot, but my advice is to stick to MBR partitioning or BSDlabels, which is an extension. There’s not a lot of point in ever mounting your BSD boot drive on a non-BSD system, so you’re not losing much whatever you choose.

Speaking of metadata, both GPT and gmirror can get confused if they find any old tables or labels on a “new” disk. GPT will find old backup partition tables and try to restore them for you, and gmirror will recognise old drives as containing precious data and dig its heels in when you try to overwrite it. Both gpart and gmirror have commands to erase their metadata, but I prefer to use dd to overwrite the whole disk with zeros anyway before re-use. This checks that the disk is actually good, which is nice to know up-front. You could just erase the start and end if you were in a hurry and wanted to calculate the offsets.

Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

The next thing you’ll need to do is load the geom_mirror kernel module. Either recompile the kernel with it added, or if this fills you with horror,  just add ‘load_geom_mirror=”yes”‘ to /boot/loader.conf. This does bring it in early enough in the process to let you boot from it. The loader will boot from one drive or the other and then switch to mirror mode when it’s done.

So, at this point, you’ve set up FreeBSD as you like on one drive (ada0), selecting BSDlabels or MBR as the partition method and UFS as the file system. You’ve set it to load the geom_mirror module in loader.conf.  You’re now looking at a root prompt on the console, and I’m assuming your drives are ada0 and ada1, and you want to call your mirror gm0.

Try this:

gmirror label gm0 ada0

Did it work? Well it used to once, but now you’ll probably get an error message saying it could not write metadata to ada0. If (when) this happens I know of one answer, which I found after trying everything else. Don’t be tempted to try everything else yourself (such as seeing if it works with ada1). Anything you do will either fail if you’re lucky, or make things worse. So just reboot, and select single-user mode from the loader menu.

Once you’re at the prompt, type the command again, and this time it should say that gm0 is created. My advice is to now reboot rather than getting clever.

When you do reboot it will fail to mount the root partition and stop, asking for help to find it. Don’t panic. We know where it’s gone. Mount it with “ufs:/dev/mirror/gm0s1a” or whatever slice you had it on if you’ve tried to be clever. Forgot to make a note? Don’t worry, somewhere on the boot long visible on the screen it actually tell you the name of the partition it couldn’t find.

After this you should be “in”. And to avoid this inconvenience next time you boot you’ll need to tweak /etc/fstab using an editor of your choice, although real computer nerds only use vi. What you need to do is replace all references to the actual drive with the gm0 version. Therefore /dev/ada0s1a should be edited to read /dev/mirror/gm0s1a. On a current default install, which no longer partitions the drive, this will only apply the root mount point and the swap file.

Save this, reboot (to test) and you should be looking good. Now all that remains is to add the second drive (ada1 in the example) with the line:

gmirror insert gm0 ada1

You can see the effect by running:

gmirror status

Unless your drive is very small, gm0 will be DEGRADED and it will say something about being rebuilt. The precise wording has changed over time. Rebuilding takes hours, not seconds so leave it. Did I mention it’s a good idea to do this when the system isn’t busy?

5 Replies to “ZFS is not always the answer. Bring back gmirror!”

  1. Nice, thanks.

    I love ufs, gmirror and gpart. ZFS as well, but still not able to replace some workloads on my (probably too old for 2023) servers.

    Just a note, instead MBR why not to create GPT and mirror partitions?

    e.g. for whole disks da0 and da1 mirror will be

    gpart create -s GPT da0
    gpart add -t freebsd-ufs da0
    gpart backup da0 | gpart restore da1

    gmirror label gm0 da0p1 da1p1

    In this case there’s no overlap in gpt and gmirror metadata, large disks supported, and so on.
    In addition, gpart aligns partition start properly for 4096 sector size disk.

    1. I’m worried that the gmirror and GPT data will (or might) collide with this. IIRC gmirror uses the last sector on the drive, which is exactly where GPT stores its backup copy. This might have changed since I wrote the post – it would be possible to modify gmirror to use the second last sector, for example.

      If your scheme is working then I don’t know why – unless the 4K sectors and blocks are creating a gap.

      One way you could use GPT and gmirror at the same time is if you mirrored the partitions rather than the entire disk. For example, if your GPT partitions are called tom, dick and harry you could clone the disk as you suggest and then label the partitions with “gmirror label -h tom /dev/gpt/tom0 /dev/gpt/tom1” and so on. Do this before loading the kernel module and it should should mirror them all when you do.

      It’d be a bit of a pain replacing a failed disk – you’d have to create it with the same partition information and forget/insert them all individually.

      But I may be out of date and it may work on the entire disk now :-)

      1. Yes, exactly what I do – mirror (gpt) partitions, not disks.
        I use this widely, no gaps / overlaps:
        gpt’s metadata in last sector of da0
        gmirror’s metadata in last sector of da0p1

        About pain of replacing disk – only one extra command:
        gpart backup da0 | gpart restore da1
        where assume da0 is live disk and da1 is new one.

        and
        gmirror forget + gmirror insert – required to execute anyway.

  2. Thanks Frank!

    It’s great to see someone else still wanting to use gmirror in 2017 8-)

    It’s 2018 now, and one question I wonder if you’ve solved: What if your twin disks are 4TB or larger?

    Is there someway to still boot off of these drives using the MBR partition scheme with whole disk gmirror?

    I hope so. Thank You!

    johnea

    1. Hi Johnea,

      I haven’t tried to do exactly what you’re doing, however it should work. Most of the time.

      MBR only allows 4G (2^32) sectors on a disk, and with traditional 512b (1/2 kilobyte) sectors this means 2TB maximum. However, AFD drives use 4096 sectors so you should be okay. Most >2Tb drives are AFD.

      The reason I haven’t tried this is that I have never had the need to – I use GPT instead. But GPT and GEOM Mirror are incompatible, right? Wrong. Well mostly wrong.

      There is a problem. GPT keeps a second copy of the partition table at the END of the drive. This is in the same spot that GEOM Mirror keeps it’s private data too.

      The simple solution is to mirror partitions rather than the whole drive, even if the whole drive is one large partition. I’ve done this and kept things bootable by simply copying the first drive, but I can’t remember the twist – having the same UUIDs identifying partitions is bad news but I don’t remember how I dealt with it.

      Basically, if I have huge file stores I use ZFS and keep UFS for smaller fast system drives and databases.

Leave a Reply

Your email address will not be published. Required fields are marked *