Swapper – Frank Leonhardt's Blog

00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000400 01 00 00 00 ff ff 03 00 00 00 00 00 ad 40 e5 4b |.............@.K| 00000410 87 1a 46 d2 b0 36 ee 60 6d 40 08 d1 00 00 00 00 |..F..6.`m@......| 00000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000ff0 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32 |......SWAPSPACE2| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 40000000

union swap_header { struct { char reserved[PAGE_SIZE - 10]; char magic[10]; } magic; struct { char bootbits[1024]; __u32 version; __u32 last_page; __u32 nr_badpages; __u32 padding[117]; __u32 badpages[1]; } info; };

Adding extra swap space to FreeBSD is easy, right? Just find a spare block storage device and run swapon with its name as an argument. I’ll put a step-by-step on how you actually do this at the end of the post in case this is news to you.

However, I’ve just found a very interesting gotcha, which could bite anyone running a 64-bit kernel and 8Gb+ of RAM.

From here we’re getting into the FreeBSD kernel – if you just want to know how to set up a lot of swap space, skip to the end…

I’ve been running a program to process a very large XML file into a large binary file – distilling 100Gb of XML into 1Gb of binary. This is the excuse for needing 16Gb of working storage (please excuse my 1970’s computer science terminology, but it’s a lot more precise than the modern “memory” and it makes a difference here).

I was using 2Gb of core and 8Gb of swap space, but this was too little so I added an extra 32Gb of swap file. Problem sorted? Well top and vmstat both reported 40Gb of swap space available so it looked good. However, on running the code it bombed out at random, with an enigmatic message “Killed” on the user console. Putting trace lines in the code narrowed it down to a random point while traversing a large array of pointers to pointers to the 15Gb heap, about an hour into the run. It looked for all the world like pointer corruption causing a Segmentation Fault or Bus Error, but if the process had a got that kind of signal it should have done a core dump, and it wasn’t happening. The output suggested a SIGKILL. But it wasn’t me sending it, and there were no other users logged in. Even a stack space error, which might have happened as qsort() was involved, was ruled out as the cause – and the kernel would have sent an ABORT, not a KILL in this case.

I finally tracked it down to a rather interesting “undocumented” feature. Within the kernel there is a structure called swblock in the John Dyson/Matthew Dillon VM handler, and a pointer called “swap” points to a chain of these these structures. Its size is limited by the value of kern.maxswzone, which you can tweak in /boot/loader.conf. The default (AMD64 8.2-Release) allows for about 14Gb of swap space, but because it’s a radix tree you’ll probably get a headache if you try to work it out directly. However, if you increase the swap space beyond this it’ll report as being there, but when when you try to use the excess, crunch!

Although this variable is tunable, it’s also hard-limited in include/param.h to 32M entries; each entry can manage 16 pages (if I’ve understood the code correctly). If you want to see exactly what’s happening, look at vm/swap_pager.c.

The hard limit to the size number of swblock entries is set as VM_SWZONE_SIZE_MAX in include/param.h. I have no idea why, and I haven’t yet tried messing with it as I have no need.

So, what was happening to my process? Well it was being killed by vm_pageout_oom() in vm/vm_pageout.c. This gets called when swap space OR the swblock space is exhausted, either in vm_pageout.c or swap_pager.c. In some circumstances it prints “swap zone exhausted, increase kern.maxswzone\n” beforehand, but not always. It’s effect is to find the largest running non-system process on the system and shoot it using killproc().

Mystery solved.

So, here’s how to set up 32Gb of USABLE swap space.

First, find your swap device. You can have as many as you want. This is either a disk slice available in /dev or, if you want to swap to a file, you need ramdisk to do the mapping. You can have as many swap devices as you like and FreeBSD will balance their use.

If you can’t easily add another drive, you’re best option is to add an extra swap file in the form of a ram disk on the existing filing system. You’ll need to be the root user for this.

To create a ram disk you’ll need a file to back it. The easy way to create one is using “dd”:

dd if=/dev/zero of=/var/swap0 bs=1G count=32

This creates a 32G file in /var filled with nulls – adjust as required.

It’s probably a good idea to make this file inaccessible to anyone other than root:

chmod 0600 /var/swap0

Next, create your temporary file-backed RAM disk:

mdconfig -a -t vnode -f /var/swap0 -u 0

This will create a device called /dev/md with a unit number specified by -u; in this case md0. The final step is to tell the system about it:

swapon /dev/md0

If you wish, you can make this permanent by adding the following to /etc/rc.conf:

swapfile="/var/swap0"

Now here’s the trick – if your total swap space is greater than 14Gb (as of FreeBSD 8.2) you’ll need to increase the value of kern.maxswzone in /boot/loader.conf. To check the current value use:

sysctl kern.maxswzone

The default output is:

kern.maxswzone: 33554432

That’s 0x2000000 32M. For 32Gb of VM I’m pretty sure you’d be okay with 0x5000000 (in round numbers), which translates to 83886080, so add this line to /boot/loader.conf (create the file if it doesn’t exist) and reboot.

kern.maxswzone="83886080"

Tag: Swapper

Linux swap files

Large swap files on FreeBSD die with mystery “Killed” – howto add lots of swap space

So, here’s how to set up 32Gb of USABLE swap space.