ZFS or UFS? – Frank Leonhardt's Blog

I started writing the last post as a discussion of ZFS and UFS and it ended up an explainer about how UFS was viable with gmirror. You need to read it to understand the issues if you want redundant storage. But in simple terms, as to which is better, ZFS is. Except when UFS has the advantage.

UFS had a big problem. If the music stopped (the kernel crashed or the power was cut) the file system was in a huge mess as the data on disk wasn’t updated in the right order as it went along. This file system was also know as FS or FFS (Fast File System) but they were more or less the same thing, and it is now history. UFS2 came along (and JFS2 on AIX), which had journaling so that if there was an abrupt it could probably catch up with itself when the power came back. As with databases, a journal keeps an ordered records of updates you can can apply them to a potentially messed up system later in case they were missed. Now we’re really talking about UFS2 here, which is pretty solid.

Then along comes ZFS, which combines a next generation volume manager and next generation file system in one. In terms of features and redundancy it’s way ahead. Some key advantages are built and very powerful RAID, Copy-on-Write for referential integrity following a problem, snapshots, compression, scalability – the list is long. If you want any of these good features you probably want ZFS. But there are two instances where you might want to stick with UFS2.

Cost

The first problem with ZFS is that all this good stuff comes at a cost. It’s not a huge cost by modern standards – I’ve always reckoned an extra 2Gb of RAM for the cache and suchlike covers the resource and performance issues . But on a very small system, 2Gb of RAM is significant.

The second problem is more nuanced. Copy-on-Write. Basically, in order to get the referential integrity and snapshots, when you change the contents of a block within a file ZFS it doesn’t overwrite the block with new data. It writes a new block in free space and links to that instead. If the old block isn’t needed as part of a snapshot it will be marked as free space afterwards. This means that if there’s a failure while the block is half written, no problem – the old block is there and the write never happened. Reboot and you’re at the last consistent state, no more than five seconds before some idiot dug up the power cable.

Holy CoW!

So Copy-on-Write makes sense in many ways, but as you can imagine, if you’re changing small bits of a large random access file, that file is going to end seriously fragmented. And there’s no way to defragment it. This is exactly what a database engine does to its files. Database engines enforce their own referential integrity using synchronous writes, so they’re going to be consistent anyway – but if you’re insisting all transactions in a group are written in order, synchronously, and the underlying file system is spattering blocks all over the disk before returning, you’ve got a double whammy – fragmentation and slow write performance. You can put a lot of cache in to try and hide the problem, but you can’t cache a write if the database insists it won’t proceed until it’s actually stored on disk.

In this one use case, UFS2 is a clear winner. It also doesn’t degrade so badly as the disk becomes full. (The ZFS answer is that if the zpool is approaching 80% capacity, add more disks).

Best of Both

There is absolutely nothing stopping you having ZFS and UFS2 on the same system – on the same drives even. Just create a partition for your database, format it using makefs and mount it on the otherwise ZFS tree wherever it’s needed. You probably want it mirrored for redundancy, so use gmirror. You won’t be able to snapshot it, or otherwise back it up while it’s running, but you can dump a database to a ZFS dataset and have that replicated along with all the others.

You can also boot off UFS2 and create a zpool on additional drives or partitions if you prefer, mounting them on the UFS tree. Before FreeBSD 10 had full support for booting direct of ZFS this was the normal way of using it. The advantages of having the OS on ZFS (easy backup, snapshot and restore) mean it’s probably preferable to use it for the root now, and mount any UFS2 file systems in directories off it.

Cost

Holy CoW!

Best of Both

Leave a Reply Cancel reply