
I started writing the last post as a discussion of ZFS and UFS and it ended up an explainer about how UFS was viable with gmirror. You need to read it to understand the issues if you want redundant storage. But in simple terms, as to which is better, ZFS is. Except when UFS has the advantage.
UFS had a huge problem. If the music stopped (the kernel crashed or the power was cut) the file system was in a huge mess as it wasn’t updated in the right order as it went along. This file system was also know as FS or FFS (Fast File System) but it was more or less the same thing, and is now history. UFS2 came along (and JFS2 on AIX), which had journaling so that if the music stopped it could probably catch up with itself when the power came back. We’re really talking about UFS2 here, which is pretty solid.
Then along comes ZFS, which combines a next generation volume manager and next generation file system in one. In terms of features and redundancy it’s way ahead. Some key advantages are built and very powerful RAID, Copy-on-Write for referential integrity following a problem, snapshots, compression, scalability – the list is long. If you want any of these good features you probably want ZFS. But there are two instances where you want UFS2.
Cost
The first problem with ZFS is that all this good stuff comes at a cost. It’s not a huge cost by modern standards – I’ve always reckoned an extra 2Gb of RAM for the cache and suchlike covers the resource and performance issues . But on a very small system, 2Gb of RAM is significant.
The second problem is more nuanced. Copy on Write. Basically, to get the referential integrity and snapshots, if you change the contents of a block within a file in ZFS it doesn’t overwrite the block, writes a new block in free space. If the old block isn’t needed as part of a snapshot it will be marked as free space afterwards. This means that if there’s a failure while the block is half written, no problem – the old block is there and the write never happened. Reboot and you’re at the last consistent state no more than five seconds before some idiot dug up the power cable.
Holy CoW!
So Copy-on-Write makes sense in many ways, but as you can imagine, if you’re changing small bits of a large random access file, that file is going to end seriously fragmented. And there’s no way to defragment it. This is exactly what a database engine does to its files. Database engines enforce their own referential integrity using synchronous writes, so they’re going to be consistent anyway – but if you’re insisting all transactions in a group are written in order, synchronously, and the underlying file system is spattering blocks all over the disk before returning you’ve got a double whammy – fragmentation and slow write performance. You can put a lot of cache in to try and hide the problem, but you can’t cache a write if the database insists it won’t proceed until it’s actually stored on disk.
In this one use case, UFS2 is a clear winner. It also doesn’t degrade so badly as the disk becomes full. (The ZFS answer is that if the zpool is approaching 80% capacity, add more disks).
Best of Both
There is absolutely nothing stopping you having ZFS and UFS2 on the same system – on the same drives even. Just create a partition for your database, format it using makefs and mount it on the ZFS tree wherever it’s needed. You probably want it mirrored, so use gmirror. You won’t be able to snapshot it, or otherwise back it up while it’s running, but you can dump it to a ZFS dataset and have that replicated along with all the others.
You can also boot of UFS2 and create a zpool on additional drives or partitions if you prefer, mounting them on the UFS tree. Before FreeBSD 10 had full support for booting direct of ZFS this was the normal way of using it. The advantages of having the OS on ZFS (easy backup, snapshot and restore) mean it’s probably preferable to use it for the root.

