Mirrored swap devices

Although some of this is BSD specific, the principles apply to any Unix or Linux.

When you install you Unix like OS across several disks, either with a mirror or RAID system (particularly ZFS RAIDZ) you’ll be asked if you want to set up a swap partition, and if you want it mirrored.

The default (for FreeBSD) is to add a swap partition on every disk and not mirror it. This is actually the most efficient configuration apart from having dedicated swap drives, but is also a spectacularly bad idea. More on this later.

What is a swapfile/drive anyway?

The name is a hangover from early swapping multi tasking systems. Only a few programs could fit in main memory, so when their time allocation ran out they were swapped with others on a disk until it was their turn again.

These days we have “virtual memory”, where a Memory Management Unit (MMU) fixed it so blocks of memory known as pages are stored on disk when not in use and automatically loaded when needed again. This is much more effective than swapping out entire programs but needs MMU hardware, which was once complex, slow and expensive.

So the swap partition should really be called the paging partition now, and Microsoft actually got the name right on Windows. But we still call it the swap partition.

What you need to remember is that parts of a running programs memory may be in the swap partition instead of RAM at any time, and that includes parts of the operating system.

Strategies

There are several ideas for swap partitions in the 2020s.

No swap partition

Given RAM is so cheap, you can decide not to bother with one, and this is a reasonable approach. Virtual memory is slow, and if you can, get RAM instead. It can still pay to have one though, as some pages of memory are rarely, if ever, used again once created. Parts of a large program that aren’t actually used, and so on. The OS can recognise this and page them out, using the RAM for something useful.

You may also encounter a situation where the physical RAM runs out, which will mean no further programs can be run and those already running won’t be able to allocate any more. This leads to two problems: Firstly “Developers” don’t often program for running out of memory and their software doesn’t handle the situation gracefully. Secondly, if the program your need to run is you login shell you’ll be locked out of your server.

For these reasons I find it better to have a swap partition, but install enough RAM that it’s barely used. As a rule of thumb, I go for having the same swap space as there is physical RAM.

Dedicated Swap Drive(s)

This is the classic gold standard. Use a small fast drive (and expensive), preferably short stroked, so your virtual memory goes as fast as possible. If you’re really using VM this is probably the way to go, and having multiple dedicated drives spreads the load and increases performance.

Swap partition on single drive

If you’ve got a single drive system, just create a swap partition. It’s what most installers do.

Use a swap file

You don’t need a drive or even a partition. Unix treats devices and files the same, so you can create a normal file and use that.

truncate -s 16G /var/swapfile
swapon /var/swapfile

You can swap on any number of files or drives, and use “swapoff” to stop using a particular one.

Unless you’re going for maximum performance, this has a lot going for it. You can allocate larger or smaller swap files as required and easily reconfigure a running system. Also, if your file system is redundant, your swap system is too.

Multiple swap partitions

This is what the FreeBSD installer will offer by default if you set up a ZFS mirror or RAIDZ. It spreads the load across all drives. The only problem is that the whole point of a redundant drive system is that it will keep going after a hardware failure. With a bit of swap space on every drive, the system will fail if any of the drives fails, even if the filing system carries on. Any process with RAM paged out to swap gets knocked out, including the operating system. It’s like pulling out RAM chips and hoping it’s not going to crash. SO DON’T DO IT.

If you are going to use a partition on a data drive, just use one. On an eight drive system the chances of a failure on one of eight drives is eight times higher than one one specific unit, so you reduce the probability of failure considerably by putting all your eggs in one basket. Counterintuitive? Consider that if one basket falls on a distributed swap, they all do anyway.

Mirrored swap drives/partitions

This is sensible. The FreeBSD installer will do this if you ask it, using geom mirror. I’ve explained gmirror in posts passem, and there is absolutely no problem mixing it with ZFS (although you might want to read earlier posts to avoid complications with GPT). But the installer will do it automatically, so just flip the option. It’s faster than a swap file, although this will only matter if your job mix actually uses virtual memory regularly. If you have enough RAM, it shouldn’t.

You might think that mirroring swap drives is slower – and to an extent it is. Everything has to be written twice, and the page-out operation will only complete when both drives have been updated. However, on a page-in the throughput is doubled, given the mirror can read either drive to satisfy the request. The chances are there will be about the same, or slightly more page-ins so it’s not the huge performance hit it might seem at first glance.

Summary

MethodProsCons
No swapSimple
Fastest
Wastes RAM
Can lead to serious problems if you run out of RAM
Dedicated Swap Drive(s)Simple
Optimal performance
Each drive is a single point of failure for the whole system
Multiple Swap PartitionsImproved performance
Lower cost than dedicated
Each drive is a single point of failure for the whole system
Single swap partition (multi-drive system)Simple
Lower probability of single point of failure occurring.
Reduced performance
Still has single point of failure
Mirrored drives or partitionsNo single point of failure for the whole systemReduced performance
Swap fileFlexible even on live system
Redundancy the same as drive array
Reduced performance
Quick summary of different swap/paging device strategies.

Conclusion

Having swap paritions on multiple drives increases your risk of a fault taking down a server that would otherwise keep running. Either use mirrored swap partitions/drives, or use a swap file on redundant storage. The choice depends on the amount of virtual memory you use in normal circumstances.

Microsoft sued over Windows 11 debacle

I’m not normally a fan of vexatious litigation, but when someone decides to harras Microsoft over their outrageous move to force 240 million Windows PCs onto the scrap heap I can only applaud.

The heroic litigant is a chap called Lawrence Klein, and is from Southern California in case it wasn’t obvious.

He’s not actually after them for billions, but reckons they’re abusing monopoly power and wants the judge to force them to provide security updates for Windows 10 until it’s only 10% of the installed base. It seems very reasonable to me.

The complaint was filed in San Diego. Mr Klein is right on the button and isn’t holding back.

The text of the complaint can be found on the Courthouse News web site here.

FreeBSD/Linux as Fibre Broadband router

British Telecom, bless them, has decided that copper telephone lines have to go and is forcing everyone onto fibre Internet and VoIP. Except rural customers currently connected to the Internet using a wet piece of string if they’re lucky, of course.

Incidentally, “Fibre Broadband” is a nonsense in a technical sense but the battle is lost – the public believes Broadband is any Internet connection to the home that isn’t dial-up.

Although I’ve written about routing on FreeBSD before, I thought it was time for an update. Why route on FreeBSD? Because unlike the cheap and nasty “routers” supplied by domestic (and some commercial) ISPs, it doesn’t crash. You don’t have to turn it off and on again. And it does what it’s told, with great diagnostics. You can also run plenty of other services on the same box if it’s powerful enough, or your throughput is modest.

Most of this should work fine on Linux, although the networking is generally considered less efficient than the real thing. However, at less than 1Gbps on a single line this isn’t going to matter, if it matters at all. With Linux you get less of the nuts and bolts built in to the base system so you may have to install extra packages depending on which distribution you are using. But this is all standard stuff so shouldn’t be too difficult. It’s the settings that matter, and probably the reason you’re reading this!

In this first article I’ll just consider a gateway router with NAT, and leave DNS, DHCP and other options until later.

Setting up PPPoE using user-ppp

First off, your WAN connection. With FTTC and FTTP this is normally a little white box – either a VDSL modem or an ONT. It connects to the phone line or fibre cable on one end, and has an RJ45 on the other that looks like Ethernet, because it is Ethernet. I’m going to call them Ethernet Modems, as they’re treated the same for our purpose. However, being Ethernet won’t do you much good as it’s just talking a protocol called PPPoE – or Point-to-Point Protocol over Ethernet.

PPP is an old protocol for making an Internet connection using dial-up, but it’s evolved (or suffered mission creep) and it’s now rather complicated thanks to all the baggage. Fortunately you can ignore the baggage and concentrate on the PPPoE stuff, once you know which is which. And that’s always the trick.

You’ll need a host (i.e. computer) with two Ethernet ports unless you want a complicated life. If you’re using an old PC with just one you can get away with a USB3 Ethernet adapter, but having a couple of server-grade NICs on the motherboard or add-on cards is the best way to go. Very generally, Intel or Broadcom are good choices, Realtek is at the low end.

You need to connect your Ethernet Modem to one port on your host and the other port goes to the LAN.

If you Ethernet Modem and the host you’re planning to use as a router are in different places you can connect them using a VLAN. It’s proper Ethernet and can be switched. Without a VLAN it’s not so simple, so plug it in using a direct cable.

PPP is built in (to FreeBSD etc) in the base system. Type ppp (as root) and it’ll start up in interactive mode. If it doesn’t, you’re not using BSD and therefore lack a base system and will have to install it as a package. You might like to start here: https://tldp.org/HOWTO/PPP-HOWTO/

Although you can compile PPP support into the kernel, the ppp we’re talking about is a program written by Toshiharu OHNO and Brian SOMERS in the early 1990s, and part of BSD since FreeBSD and OpenBSD 2. It’s the normal straightforward way of doing things.

ppp has a simple config file in /etc/ppp/ppp.conf. It can contain profiles for multiple services in sections, with the service name being arbitrary, and ending in a colon (“:”). You specify the service when you run it, and stuff in other sections is ignored. This is a hangover from the days when people had multiple dial-up connections.

Here’s a service definition for Cloudscape, one of my favourite ISPs, but other UK FTTP services will be similar or identical. UK FTTC and SoGEA modems are pretty much the same too.

cloudscape:
  delete default                # May already have a
                                # default route configured elsewhere
  set device PPPoE:bge1
  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP
  set dial
  set login
  set lcp
  set mru 1492
  set mtu 1492
  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #   (detect dropped line)
  enable echo                 # Enable echo for LQR
  iface name wan0
  add default HISADDR

The ppp program was originally used for serial PPP connections to dial-up ISPs or organisations, but here we’re just using it for PPPoE. In support of switching ISPs it can add stuff to config files like resolv.conf and the routing table, which in the old days tended to be dynamic.

Feel free to read the manual that explains what the options above do, but briefly I’m starting by deleting the default route, which probably won’t exist unless you’ve configured it (possibly using DHCP), but if it does will cause problems when ppp adds another.

  set device PPPoE:bge1

This says we’re using PPPoE over the bge1 Ethernet card. Obviously set this to the Ethernet card to which your Ethernet Modem (e.g. ONT) is attached.

  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP

This is the user-name and password supplied by your ISP. These tend to be low security, but are needed for the protocol for historic reasons.

  set dial
  set login
  set lcp

This will cause ppp to dial, log in and get details using LCP. Some people will try to tell you that internet lines are configured with DHCP – that’s for LANs. LCP (Link Control Protocol) provides the same function, such as what your IP address is and which DNS servers to use, over a point-to-point connection.

  set mru 1492
  set mtu 1492

There are eight bytes of protocol data added to every standard 1500 byte Ethernet frame so won’t fit 1:1 with a PPPoE packet. Reducing the MTU to 1492 gets around this and avoids fragmentation, which is a good thing. LCP might suggest or force a lower MTU but there’s no harm in specifying it.

  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #  (detect dropped line)
  enable echo                 # Enable echo for LQR

This disables IPv6 and enables IPv4 (which is on by default anyway). If you want to use IPv6 your service provider needs to support it, and most don’t.

LQR is probably not going to be necessary for our purposes and generates warnings, so I’ve left the line in but commented it out for now. The enable echo therefore has no effect.

  iface name wan0

By default, ppp will name its connections as tun0, tun1 and so on (tun being Tunnel). This means that you never know what the interface is going to be called, as other tunnels may exist before you start this one. We’re going to be referring to the interface in the PF firewall, so it helps to be sure what its name will be. The line above sets the name manually, and I’ve called in wan0, which is logical. You may, of course, have multiple WAN connections including dial-up backups, so giving them a sensible name is, er, sensible. You can call it anything you like if you’re nuts.

  add default HISADDR

This is an example of ppp messing with your system configuration – in this case it’s taking the IP address supplied by LCP, represented by the macro HISADDR, and adding it as the default route. If you have a static IP address you might want to set it statically in the normal way.

Likewise, if you add the line “enable dns” it will take the DNS servers offered by LCP and add them to resolv.conf. It won’t remove them, and may well end up messing up whatever local DNS arrangements you have, so I prefer to do this manually.

Once you’ve edited ppp.conf you can test it out interactively with “ppp cloudscape” and see what happens. Type “dial” and it should make the connection, and wan0 should appear in your list of network interfaces. Use netstat -r to see if the new default route has appeared.

Setting up the pf firewall

ppp-user is a large program that tries to do everything, including NAT and being a firewall. This isn’t very UNIX-like in philosophy, but you can use these facilities if you like. I prefer to have a dedicated standard firewall, PF, and leave that to do everything firewall-like in one place.

If you’re setting up a router you’re probably going to need asymmetric NAT. Your /etc/pf.conf file will look something like this:

scrub in all
WAN=wan0
WANIP=1.2.3.4
nat pass on $WAN from 192.168.1.0/24 to any -> $WANIP
#rdr pass on $WAN proto tcp from any to $WANIP port 80 -> 192.168.1.123

The WAN IP comes from your ISP, although you will be able to see it using “ifconfig wan0:” if you don’t have it. I’m assuming your LAN is 192.168.1.0/24 – just set this to whatever you’re using. And that’s about it.

As a bonus, the commented out example line at the end would external port 80 to a web server on LAN address 192.168.1.123 – an open port. Peter Hansteen has written an excellent book on PF, called “The Book of PF”, which will tell you everything you need to know, and it’s well documented in various online handbooks and man pages, unlike ppp-user’s built in firewall.

The only reason for using user-ppp for NAT is if you’re on a dynamic IP address, in which case and “enable nat” and add ppp_nat=yes to /etc/rc.conf

Kicking it all off

First you need to enable routing:

 sysctl net.inet.ip.forwarding=1

This will work until reboot, and you can turn it off again by setting it to zero if something bad happens, like your NIC catching fire. Then dial your ISP (Cloudscape in this example)

ppp -ddial cloudscape

You should now have a connection to the Internet on the BSD box. Now enable PF for NAT.

service pf start (or onestart)

Of it it’s running, use “service pf reload” to load the new config. At this point every machine on the LAN should be able to use your LAN IP address as a gateway.

When you’re happy it works, to make this kick off automatically on boot, modify /etc/rc.conf:

sysrc ppp_enable=yes
sysrc ppp_mode=ddial
sysrc ppp_profile="cloudscape"
sysrc pf_enable=yes
sysrc gateway_enable=yes

Optionally “sysrc ppp_nat=yes” if you’re not using PF for NAT. Or if you’re editing rc.conf directly:

pf_enable=yes
gateway_enable=yes

ppp_enable="YES"
ppp_mode="ddial"
#ppp_nat="YES"	# We let PF do NAT
ppp_profile="name_of_service_provider"

I will do a part two to this post explaining how to configure DNS and DHCP, although there’s no reason these need to be on the same host you’re using as a router. In fact it’s good practice to separate them and have more than one DHCP and DNS server if you have the resources.

I hope you found it useful – any questions add a comment below.

How to tell if a host is up without ping

Some people seem to think that disabling network pings (ICMP echo requests to be exact) is a great security enhancement. If attackers can’t ping something they won’t know it’s there. It’s called Security through Obscurity and only a fool would live in this paradise.

But supposing you have something on your network that disables pings and you, as the administrator, want to know if it’s up? My favourite method is to send an ARP packet to the IP address in question, and you’ll get a response.

ARP is how you translate an IP address into a MAC address to get the Ethernet packet to the right host. If you want to send an Ethernet packet to 1.2.3.4 you put out an ARP request “Hi, if you’re 1.2.3.4 please send your MAC address to my MAC address”. If a device doesn’t respond to this then it can’t be on an Ethernet network with an IP address at all.

You can quickly write a program to do this in ‘C’, but you can also do it using a shell script, and here’s a proof of concept.

#!/bin/sh
! test -n "$1" && echo $0: Missing hostname/IP && exit
#arp -d $1  >/dev/null 2>/dev/null
ping -t 1 -c 1 -q $1 >/dev/null
arp $1 | grep -q "expires in" && echo $1 is up. && exit
echo $1 is down.

You run this with a single argument (hostname or IP address) and it will print out whether it is down or up.

The first line is simply the shell needed to run the script.

Line 2 bails out if you forget to add an argument.

Line 3, which is commented out, deletes the host from the ARP cache if it’s already there. This probably isn’t necessary in reality, and you need to be root user to do it. IP address mappings are typically deleted after 20 minutes, but as we’re about to initiate a connection in line 4 it’ll be refreshed anyway.

Line 4 sends a ping to the host. We don’t care if it replies. The timeout is set to the minimum 1 second, which means there’s a one second delay if it doesn’t reply. Other ways of tricking the host into replying exist, but every system has ping, so ping it is here.

Live 5 will print <hostname> is up if there is a valid ARP cache entry, which can be determined by the presence of “expires in” in the output. Adjust as necessary.

The last line, if still running, prints <hostname> is down. Obviously.

This only works across Ethernet – you can’t get an ARP resolution on a different network (i.e. once the traffic has got through a router). But if you’re on your organisation’s LAN and looking to see if an IoT devices is offline, lost or stolen then this is a quick way to poll it and check.

Why can’t I ping my Amazon Echo?

The simple answer is that the current Amazon Echo devices don’t respond to a ping – or technically an ICMP echo request. There’s a lot of waffle on the web saying this is because they’re too simple to do it, but this isn’t the case. The original Echo (at least before software updates) and the Echo Show 8” most certainly did respond to a ping, but the functionality has been dropped since then. Some people naively think that it’s a security risk, part of a doctrine known as Security Through Obscurity. As it’s easy enough to find an Echo without a ping, it’s only a slight inconvenience to a would-be attacker and a big inconvenience to an network administrator.

Most later Echos do have open ports, however, so you can check to see if it’s alive because the port will be there. I emphasise “open”, as Echos use quite a lot of ports that aren’t always open, for things like setup or communicating out. But these ports are open and can be connected to – even if the connection is refused it shows there’s something there to refuse it.

Based on my incomplete collection of Echo devices, they have the following characteristics:

ModelPing?Ports
Original Echo
Echo Dot fourth Generation1080, 6543, 8888
Echo Flex1080, 8888
Echo Dot Second Generation1080, 8888
Echo Dot Third Generation1080, 8888
Echo Show 8-inch (second generation)Y8009
Echo Spot first Generation
Echo Show 5-inch

So how can you reliably tell if your Amazon Echo device is alive on the network? Rather than messing around with ports, my favorite way is to send it an ethernet ARP request and see if you get a reply. I did say disabling ping was a fools solution to security.

See here for how to do this.

Microsoft releases WSL open Source

Microsoft has just open-sourced it’s Windows Subsystem for Linux (WSL).

https://blogs.windows.com/windowsdeveloper/2025/05/19/the-windows-subsystem-for-linux-is-now-open-source/

This is major. WSL runs the FOSS Unix knock-off on their closed source and expensive operating system, making it possible to host Unix applications on it. Cynics might think this was a ploy to still sell a Windows server license instead of people running Linux direct on the hardware. Or you could say it allows lower skilled Windows administrators who couldn’t cope with a command line to still access Linux applications.

Since it first appeared, people have been questioning Microsoft’s open source credentials, as WSL was closed source. Not now. You can get at the source code, customise it and run your own version.

This is great news, but as with anything Microsoft, it’s probably another cyber security attack vector for Windows.

Add mirror to single ZFS disk

So you have FreeBSD a single drive ZFS machine and you want to add a second drive to mirror the first because it turns out it’s now important. Yes, it’s possible to do this after installation, even if you’re booting off ZFS.

Let’s assume your first drive is ada0, and it’s had the FreeBSD installer set it up a a “stripe on one drive” using GPT partition. You called the existing zpool “zroot” as you have no imagination whatsoever. In other words everything is the default. The new disk is probably going to be ada1 – plug it in and look on the console or /var/messages to be sure. As long as it’s the same size or larger than the first, you’re good to go. (Use diskinfo -v if you’re not sure).

FreeBSD sets up boot partitions and swap on the existing drive, and you’ll probably want to do this on the new one, if for no other reason than if ada0 fails it can boot off ada1.

gpart destroy -F ada1
gpart backup ada0 | gpart restore ada1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

This gets rid of any old partition table that might be there, copies the existing one from ada0 (which will include the boot and swap partitions as well as the ZFS one).

The third line installs a protective MBR on the disk to avoid non-FreeBSD utilities doing bad things and then adds the ZFS boot code.

If there’s a problem, zero the disk using dd and try again. Make sure you zap the correct drive, of course.

dd if=/dev/zero of=/dev/ada1 bs=32m status=progress

Once you’ve got the partition and boot set up, all you need to do is attach it to the zpool. This is where people get confused as if you do it wrong you may end up with a second vdev rather than a mirror. Note that the ZFS pool is on the third partition on each drive – i.e. adaxp3.

The trick is to specify both the existing and new drives:

zpool attach zroot ada0p3 ada1p3

Run zpool status and you’ll see it (re)silvering the new drive. No interruptions, no reboot.

pool: zroot
state: ONLINE
scan: resilvered 677M in 00:00:18 with 0 errors on Sat Apr 5 16:13:16 2025
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p3 ONLINE 0 0 0
ada1p3 ONLINE 0 0 0

This only took 18 seconds to resilver as in this case it’s just a system diskm and ZFS doesn’t bother copying unnecessary blocks.

If you want to remove it and go back to a single drive the command is:

zpool detach zroot ada1p3

Add another to create a three-way mirror. Go a little crazy!

Set up FreeBSD in two mirrored drives using UFS

I’ve written about the virtues of Geom Mirror (gmirror) in the past. Geom Mirror was probably the best way of implementing redundant storage between FreeBSD 5.3 (2004) until ZFS was introduced in FreeBSD 7.0 in 2008. Even then, ZFS is heavyweight and the Geom Mirror was tested and more practical for many years afterwards.

The Geom system also has a RAID3 driver. RAID3 is weird. It’s the one using a separate parity drive. It works, but it wasn’t popular. If you had a big FreeBSD system and wanted an array it was probably better to use an LSI host bus adapter and have that manage it with mptutil. But for small servers, especially remotely managed, Geom Mirror was the best. I’m still running it on a few twin-drive servers, and will probably continue for some time to come.

The original Unix File System (UFS2) actually has a couple of advantages over ZFS. Firstly it has much lower resource requirements. Secondly, and this is a big one, it has in-place updates. This is a big deal with random access files, such as databases or VM hard disks, as the Copy-on-Write system ZFS uses fragments the disk like crazy. To maintain performance on a massively fragmented file system, ZFS requires a huge amount of cache RAM.

What you need for random access read/write files are in-place updates. Database engines handle transaction groups themselves to ensure that the data structure’s integrity is maintained. ZFS does this at the file level instead of application level, which isn’t really good enough as the application knows what is and what isn’t required. There’s no harm in ZFS doing it too, but it’s a waste. And the file fragmentation is a high price to pay.

So, for database type applications, UFS2 still rules. There’s nothing wrong with having a hybrid system with both UFS and ZFS, even on the same disk. Just mount the UFS /var onto the ZFS tree.

But back to the twin drive system: The FreeBSD installed doesn’t have this as an option. So here’s a handy dandy script wot I rote to do it for you. Boot of a USB stick or whatever and run it.

Script to install FreeBSD on gmirror

Use as much or as little as you like.

At the beginning of the script I define the two drives I will be using. Obviously change these! If the disks are not blank it might not work. The script tries to destroy the old partition data but you may need to do more if you have it set up with something unusual.

Be careful – it will delete everything on both drives without asking!

Read the comments in the script. I have set it up to use a 8g UFS partition, but if you leave out the “-s 8g” the final partition will use all the space, which is probably what you want. For debugging I kept it small.

I have put everything on a single UFS partition. If you want separate / /usr /var then you need to modify it to what you need and create a mirror for each (and run newfs for each). The only think is that I’ve created a swap file on each drive that is NOT mirrored and configured it to use both.

I have not set up everything on the new system, but it will boot and you can configure other stuff as you need by hand. I like to connect to the network and have an admin user so I can work on a remote terminal straight away, so I have created an “admin” user with password “password” and enabled the ssh daemon. As you probably know, FreeBSD names its Ethernet adapters by manufacturer and you don’t know what you’ll have so I just have it try DHCP on every possible interface. Edit the rc.conf file how you need it once it’s running.

If base.txz and kernel.txz are in the current directory, fine. The script tries to download them at present.

And finally, I call my mirrors m0, m1, m2 and so on. Some people like to use gm0. It really doesn’t matter what you call them.

#!/bin/sh
# Install FreeBSD on two new disks set up a a gmirror
# FJL 2025
# Edit stuff in here as needed. At present it downloads
# FreeBSD 14.2-RELEASE and assumes the disks
# in use are ada0 and ada1

# Fetch the OS files if needed (and as appropriate)
fetch https://download.freebsd.org/ftp/releases/amd64/14.2-RELEASE/kernel.txz
fetch https://download.freebsd.org/ftp/releases/amd64/14.2-RELEASE/base.txz

# Disks to use for a mirror. All will be destroyed! Edit these. The -xxxx
# is there to save you if you don't
D0=/dev/da1-xxxxx
D1=/dev/da2-xxxxx

# User name and password to set up initial user.
ADMIN=admin
ADMINPASS=password

# Make sure the geom mirror module is loaded.
kldload geom_mirror

# Set up the first drive
echo Clearing $D0
gpart destroy -F $D0
dd if=/dev/zero of=$D0 bs=1m count=10

# Then create p1 (boot), p2 (swap) and p3 (ufs)
# Note the size of the UFS partition is set to 8g. If you delete
# the -s 8g it will use the rest of the disk by default. For testing
# it's better to have something small so newfs finishes quick.

echo Creating gtp partition on $D0
gpart create -s gpt $D0
gpart add -t freebsd-boot -s 512K $D0
gpart add -t freebsd-swap -s 4g $D0
gpart add -t freebsd-ufs -s 8g $D0

echo Installing boot code on $D0
# -b installs protective MBR, -i the Bootloader.
# Assumes partition 1 is freebsd-boot created above.
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 $D0

# Set up second drive
echo Clearing $D1
gpart destroy -F $D1
dd if=/dev/zero of=$D1 bs=1m count=10

# Copy partition data to second drive and put on boot code
gpart backup $D0 | gpart restore $D1
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 $D1

# Mirror partition 3 on both drives
gmirror label -v m0 ${D0}p3 ${D1}p3

echo Creating file system
newfs -U /dev/mirror/m0
mkdir -p /mnt/freebsdsys
mount  /dev/mirror/m0 /mnt/freebsdsys

echo Decompressing Kernel
tar -x -C /mnt/freebsdsys -f kernel.txz
echo Decompressing Base system
tar -x -C /mnt/freebsdsys -f base.txz

# Tell the loader where to mount the root system from
echo 'geom_mirror_load="YES"' > /mnt/freebsdsys/boot/loader.conf
echo 'vfs.root.mountfrom="ufs:/dev/mirror/m0"' \
>> /mnt/freebsdsys/boot/loader.conf

# Set up fstab so it all mounts.
echo $D0'p2 none swap sw 0 0' > /mnt/freebsdsys/etc/fstab
echo $D1'p2 none swap sw 0 0' >> /mnt/freebsdsys/etc/fstab
echo '/dev/mirror/m0 / ufs rw 1 1' >> /mnt/freebsdsys/etc/fstab

# Enable sshd and make ethernet interfaces DHCP configure
echo 'sshd_enable="YES"' >/mnt/freebsdsys/etc/rc.conf
for int in em0 igb0 re0 bge0 alc0 fxp0 xl0 ue0 igb0 xcgbe0 bnxt0 mlx0
do
echo 'ifconfig_'$int'="DHCP"' >>/mnt/freebsdsys/etc/rc.conf
done

# Create initial user suitable for ssh login
pw -R /mnt/freebsdsys useradd $ADMIN -G wheel -m
echo "$ADMINPASS" | pw -R /mnt/freebsdsys usermod -n $ADMIN -h 0
echo "$ADMINPASS" | openssl passwd -6 -stdin | pw -R /mnt/freebsdsys usermod -n $ADMIN -H 0

# Tidy up
umount /mnt/freebsdsys
echo Done. Remove USB stick or whatever and reboot.

Configuring host names using DHCP

Background

If you know all about DHCP, feel free to skip this bit.

In the Unix world the network administrator assigns every host (networked computer) the stuff in needs to operate on the network – it’s name and IP address. Other hosts can find it by looking its name up on the DNS server (or hosts list before DNS was invented) and start talking.

The host new its name and IP address because it was set in a configuration file, along with other network stuff like gateway routers and DNS servers.

Microsoft didn’t use IP networking for a long time, using NetBEUI and other protocols to dispense with a network administrator and configure stuff automatically over Ethernet (mainly). Or was that NetBIOS or WINS or ??? Anyway, the usual bugger’s muddle. When Microsoft finally realised the Internet was Important, Windoze machines also worked with Unix networking (IP, DNS and other good things). The stuck with versions of their own crazy file sharing system but that’s another story.

Meanwhile, it was realised that editing a configuration file on every host was a bit of a problem, especially if you had to edit it everywhere if you changed anything network-ish. And Dynamic Host Configuration Protocol (DHCP) was invented in the early 1990s. This combined the best of both worlds – automatic configuration with a network administrator in charge.

DHCP operates using a DHCP server. When a host boots it can get it’s network stuff from the DHCP server before it knows anything about the IP network. It effectively does this using an Ethernet (layer 2) multicast packet, but the details are complicated and not relevant here.

The DHCP server sees this request for details and sends the host back its settings. These could be the next free IP address from a pool, together with other important information like the subnet, gateway, local DNS and domain name. The host says “thank you very much” and configures itself as a fine upstanding and proper member of the domain. Don’t confuse domain with Microsoft Domain stuff, BTW. They used the name wrong. This is the DNS-type domain.

Manual allocation

I said in the bit you skipped reading that the DHCP server could send the client the next free IP address from a pool. But, you can also send precise details you want the host configured with. This means you can keep your network configuration in one file on the DHCP server rather than in startup files on every host, see how everything is set up and make small or large changes with a text editor. Almost. You’ll also need to edit the files on your DNS server to make the names to IP addresses translation work. Having both servers on the same machine makes sense.

How does the DHCP server know who’s asking, and therefore which configuration to send? Easy, it goes but the Ethernet MAC address.

Assuming you know how to configured DNS, here’s how you do it.

dhcpd

You’ll need the DHCP Demon, “dhcpd” from the Internet Software Consortium. Compile it or install the package as necessary. It has a configuration file called dhcpd.conf (usually in /usr/local/etc or /etc) which is where you set everything up. It comes with examples, but you’re looking at something like this.

Let’s assume your organisation is called flubnutz.com and the the DHCP server is on the LANin the London office – i.e. london.flubnutz.com. The hosts on the LAN belong to tom, dick and harry and you’ve got a printer called “printer” and a router called “gateway”, and the local IP addresses are 192.168.3.x with a 255.255.255.0 subnet mask.

dhcpd.conf will start something like this

 default-lease-time 43200;
max-lease-time 86400;

option domain-name "london.flubnutz.com";
option domain-name-servers 192.168.3.219;

subnet 192.168.3.0 netmask 255.255.255.0 {
range 192.168.1.100 192.168.3.163;
option broadcast-address 192.168.3.255;
option routers 192.168.3.2;
}

The lease times are how long you’re going to allow a host to hold on to a dynamic address from the pool. If it held it forever, you’d eventually run out. “default-lease-time” is how long the address lasts (in seconds) if the client doesn’t ask for anything specific, and max-least-time for when the client does ask but is being greedy. For testing purposes setting these to 60 seconds is not unreasonable. The values above represent 12 or 24 hours.

Next come some options. These are fields sent in the DHCP reply. The stuff you can set on the client. There are a lot of them – see “man dhcp-options” on any Unix-compatible system.
Here I want everything on the LAN to know it’s part of london.flubnutz.com, and the DNS server is at 192.168.3.219. Every host asking the DHCP server gets these options set for them.

The next definition is a subnet. Any IP address in that subnet gets those options set – in this case the broadcast address gateway router. These could have been universal options, but for the sake of an example I put them inside the { and }.

Note there’s also a “range” statement in the subnet definition. This is the range of dynamically allocated IP addresses – in this case there are 64, between 100 and 163, and are to cope with people’s smartphones and when people turn up from head office with their swanky laptops. The range doesn’t have to cover the complete subnet, but it can’t be larger.

And that’s pretty much it for the main part. This just leaves the manual definitions which take the form of host statements that look like this:

host tom {
hardware ethernet c4:34:6b:21:94:10;
option host-name "tom.london.flubntuz.com";
fixed-address 192.168.3.165;
}
host dick {
hardware ethernet 3c:4a:92:77:af:4e;
option host-name "dick.london.flubntuz.com";
fixed-address 192.168.3.166;
}
host printer {
hardware ethernet 2C:76:8A:AD:71:FF;
option host-name "printer.london.flubntuz.com";
fixed-address 192.168.3.200;
}

And so on…

The DHCP server recognises each host by its MAC address, specified in each block. Other forms of hardware address are possible, but it’s probably going to be a MAC on Ethernet. The fixed address is the one that will be assigned. The subnet definition at the top will be used for the subnet mask, and the other options will be taken from the global options.

If you want something special for one host, just add the option to its definition. For example, if you wanted the printer to use a different gateway router just add a “option router 192.168.1.254” and it’d take precedence.

The host statement needs a name or IP address but we’re not using it for anything here. In fact it can be anything you like in this instance. Unfortunately it’s not the hostname that’s sent, we have to specify in option host-name, and if you want a fqdn you’ll have to specify one. It doesn’t add it to the domain-name option automatically. It think this is a fault of the client, and I haven’t quite figured out why yet.

dhclient

On the host you need to run dhclient to request the address from the DHCP server. This has a configuration file: /etc/dhclient.conf. It’s probably empty as the defaults are normally good enough. However, it does not include setting the host name. You’ll need to add a single line:

request host-name;

And that’s it. How you use it will vary from system to system, but on BSD you use “dhclient re0”, where re0 is the name of the ethernet interface, and it does does the rest. To make this automatic in FreeBSD add this to rc.conf:

ifconfig_re0="DHCP"

Make sure you don’t specify the hostname in rc.conf or it will take precedence, and it will normally have been added by the installer.

Why set the hostname using DHCP?

You might think that it’s more useful for the hostname is fixed on the actual hardware host, and most times it is. However, if you’re pulling disks from one to put them in another you may or may not what the hostname and IP address to transfer. If you do, set them in the config file. If you want DHCP to configure things correctly even if you’ve swapped system disks around, configure things on the DHCP server. If you’re cloning system disks for a large number of servers in a cluster, DHCP is your best friend. Guess what I’m working on?

ZFS In-place disk size upgrade

Everyone knows that you can replace the drives in a ZFS vdev with larger ones one a time, and when the last one is inserted it automagically uses the extra space, right?

But who’s actually done this? It does actually work, kind of.

However, small scale ZFS users are booting from ZFS, and have been since FreeBSD 10. Simply swapping out the drives with larger ones isn’t going to work. It can’t work. You’ve got boot code, swap files and other stuff to complicate it. But it can be made to work, and here’s how.

The first thing you need to consider is that ZFS is a volume manager, and normally when you create an array (RAIDZ or mirror) it expects to manage the whole disk. When you’re creating a boot environment you need bootstraps to actually boot from it. FreeBSD can do this, and does by default since FreeBSD 10 was released in 2014. The installer handles the tricky stuff about partitioning the disks up and making sure it’ll still boot when one drive is missing.

If you look at the partition table on one of the disks in the array you’ll see something like this:

=>        40  5860533088  ada0  GPT  (2.7T)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 4194304 2 freebsd-swap (2.0G)
4196352 5856335872 3 freebsd-zfs (2.7T)
5860532224 904 - free - (452K)

So what’s going on here?

We’re using the modern GPT partitioning scheme. You may as well – go with the flow (but see articles about gmirror). This is a so-called 3Tb SATA disk, but it’s really 2.7Tb as manufacturers don’t know what a Tb really is (2^40 bytes). FreeBSD does know what a Tb, Gb, Mb and Kb is in binary so the numbers you see here won’t always match.

The disk starts with 40 sectors of GPT partition table, followed by the partitions themselves.

The first partition is 512K long and contains the freebsd-boot code. 512K is a lot of boot code, but ZFS is a complicated filing system so it needs quite a lot to be able to read it before the OS kernel is loaded.

The second partition is freebsd-swap. This is just a block of disk space the kernel can use for paging. By labelling it freebsd-swap, FreeBSD can can find it and use it. On an array, each drive has a bit of paging space so the load is shared across all of them. It doesn’t have to be this way, but it’s how the FreeBSD installer does it. If you have an SLOG drive it might make sense to put all the swap on that.

The third partition is actually used for ZFS, and is the bulk of the disk.

You might be wondering what the “- free -” space is all about. For performance reasons its good practice to align partitions to a particular grain size, in this case it appears to be 1Mb. I won’t go into it here, suffice to say that the FreeBSD installer knows what it’s doing, and has left the appropriate gaps.

As I said, ZFS expects to have a whole disk to play with, so normally you’d create an array with something like this:

zpool create mypool
raidz1 da0 da1 da2 da3

This creates a RAIDZ1 called mypool out of four drives. But ZFS will also work with geoms (partitions). With the partition scheme show above the creation command would be:

zpool create mypool
raidz1 da0p3 da1p3 da2p3 da3p3

ZFS would use partition 3 on all four drives and leave the boot code and swap area alone. And this is effectively what the installer does. da#p2 would be used for swap, and da#p1 would be the boot code – replicated but available on any drive that was still working that the BIOS could find.

So, if we’re going to swap out our small drives with larger ones we’re going to have to sort out the extra complications from being bootable. Fortunately it’s not too hard. But before we start, if you want the pool to expand automatically you need to set an option:

zpool set autoexpand=on zroot

However, you can also expand it manually when you online the new drive using the -e option.

From here I’m going to assume a few things. We have a RAIDZ set up across four drives: da0, da1, da2 and da3. The new drives are larger, and blank (no partition). Sometimes you can get into trouble if they have the wrong stuff in the partition table, so blanking them is best, and if you blank the whole drive you’ll have some confidence it’s a good one. It’s also worth mentioning at some point that you can’t shrink the pool by using smaller drives, so I’ll mention in now. You can only go bigger.

You’ll also have to turn the swap off, as we’ll be pulling swap drives. However, if you’re not using any swap space you should get away with it. Run swapctl -l to see what’s being used, and use swapoff to turn off swapping on any drive we’re about to pull. Also, back up everything to tape or something before messing with any of this, right?

Ready to go? Starting with da0…

zpool offline zroot da0p3

Pull da0 and put the new drive in. It’s worth checking the console to make sure the drive you’ve pulled really is da0, and the new drive is also identified as da0. If you pull the wrong drive, put it back and used “zpool online zroot da0” to put it back. The one you actually pulled will be offline.

We could hand partition it, but it’s easier to simply copy the partition table from one of the other drives:

gpart backup da1 | gpart restore da0

This will copy the wrong partition table over, as all the extra space will be left at the end of the disk. We can fix this:

gpart resize -i 3 da0

When you don’t specify a new size with -s, this will change the third partition to take up all remaining space. There’s no need to leave an alignment gap at the end, but if you want to do the arithmetic you can. Specify the size as the remaining size/2048 to get the number of 512 byte sectors with 1Mb granularity. The only point I can see for doing this is if you’re going to add another partition afterwards and align it, but you’re not.

Next we’ll add the boot code:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

And finally put it in the array

zpool replace zroot da0p3

Run zpool status and watch as the array is rebuilt. This may take several hours, or even days.

Once the re-silvering is complete and the array looks good we can do the same with the next drive:

zpool offline zroot da1p3

Swap the old and new disk and wait for it to come online.

gpart backup da0 | gpart restore da1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1
zpool replace zroot da1p3

Wait for resilvering to finish

zpool offline zroot da2p3

Swap the old and new disk and wait for it to come online.

gpart backup da0 | gpart restore da2
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2
zpool replace zroot da2p3

Wait for resilvering to finish

zpool offline zroot da3p3

Swap the old and new disk and wait for it to come online.

gpart backup da0 | gpart restore da3
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3
zpool replace zroot da3p3

Wait for resilvering to finish. Your pool is now expanded!

If you didn’t have autoexpand enabled you’ll need to manually expand them “zpool offline da#” followed by “zpool online -e da#”.