FreeBSD 10.0 and ZFS

It’s finally here: FreeBSD 10.0 with ZFS. I’ve been pretty happy for many years with twin-drive systems protected using gmirror and UFS. It does what I want. If a disk fails it drops it out and sends me an email, but otherwise carries on. When I put a replacement blank disk it can re-build the mirror. If I take one disk out, put it into another machine and boot it, it’ll wake up happy. It’s robust!

So why mess around with ZFS, the system that puts your drives in to a pool and decides where things are stored, so you don’t have to worry your pretty little head about it? The snag is that the old ways are dying out, and sooner or later you’ll have no choice.

Unfortunately, the transition hasn’t been that smooth. First off you have to consider 2Tb+ drives and how you partition them. MBR partition tables have difficulties with the number of sectors, although AF drives with larger sectors can bodge around this. It can get messy though, as many systems expect 512b sectors, not 4k, so everything has to be AF-aware. In my experience, it’s not worth the hassle.

The snag with the new and limitless “GPT” scheme is that it keeps safe copies of the partition at the end of the disk, as well as the start. This tends to be where gmirror stores its meta-data too. You can’t mix gmirror and GPT. Although the code is hackable, I’ve got better things to do.

So the good new is that it does actually work as a replacement for gmirror. To test it I stuck two new 3Tb AF drives into a server and installed 10.0 using the new procedure, selecting the menu option zfs on root option and GPT partitioning. This is shown in the menu as “Experimental”, but seems to work. What you end up with, if you select two drives and say you want a zfs mirror, is just that.

Being the suspicious type, I pulled each of the drives in turn to see what had happened, and the system continues without a beat just like gmirror did. There were also a nice surprises when I stuck the drives back in and “onlined” them:

First-off the re-build was almost instant. Secondly, HP’s “non-hot-swap” drive bays work just fine for hot-swap under FreeBSD/ZFS. I’d always suspected this was a Windoze nonsense. All good news.

So why is the re-build so fast? It’s obvious when you consider what’s going on. The GEOM system works a block level. If the mirror is broken it has no way of telling which blocks are valid, so the only option is to copy them all. A major feature of ZFS, however, is that the directories and files have validation codes in the blocks above, going all the way to the root. Therefore, by starting at the root and chaining down, it’s easy to find the blocks containing changed data, and copy them. Nice! Getting rid of separate volume managers and file systems has its advantages.

So am I comfortable with ZFS? Not yet, but I’m a lot happier with it when its a complete, integrated solution. Previously I’d only been using on data drives in multi-drive configurations, as although it was possible to install root on ZFS, it was a real PITA.

IP Expo 2011 – what was fun

That’s IP Expo over with for another year. I’ve never quite get what the show is about, but it’s one I seriously consider attending. It’s lack of focus is probably what makes it intersting. I’ve always suspected that some exhibition organiser kept reading about IP and decided it was a buzzword lacking its own show and started one. Anything connected to an IP network is fair game, and these days this means almost everything.

The Violin memory box is an amazing piece of kit – a massive, high-performance thumb drive connected via fibre channel. They’ve done a lot of work basically striping data across flash modules which boosts performance, avoids hitting the same flash chip repetitively and gives redundancy – I believe they can lose six modules before it bites and its hot swappable.

There were quite a lot of other storage solutions on show, some interesting, some very much the same. One company is using ZFS, which is a technology I’ve had my eye on for some time.

Prize for the fund gadget is Pelco’s thermal imaging camera – at less than £2K for the low-res version it suddenly becomes affordable, and it certainly works well enough. Still on CCTV, someone had a monitor connected to a web cam and some software to identity faces. Spooky. This put a mug-shot of everyone looking at the camera down the side of the screen, recorded how long they were standing there and guessed their sex and age. It actually took ten years of most people, which helped with the feel-good but this technology obviously works and an obvious application is snooping on people looking at shop windows to work out what attracts the right kind of demographic (why else would they have developed it). I should point out that this was showing off the screen – the web-cam and face recognition was a crowd-puller

Another interesting bit of kit is an LG stand-alone vmware terminal. This basicall allows you to virtualise your PC and use them on a thin client. The implications of this for managability are obvious – keep your PC environment in a server room, where it can be cloned and configured at will, and leave a dumb-terminal in the front line. If the terminal breaks or is stolen – no problem whatsoever. The snag? Well the terminals aren’t cheap and they could do with toughened glass.