FreeBSD ZFS RAIDZ failed disk replacement

(for the rest of us)

I don’t know about you but most of my ZFS arrays are large, using SAS drives connected via SAS HBAs

to expanders that know which disk is where. I also have multiple redundancy in the zpool and hot spares, so I don’t need pay a visit just to replace a failed disk. And if I do I can get the enclosure to flash an LED over the drive I’m interested in replacing.

Except at home. At home I’ve got what a lot of people probably have. A small box with a four-drive cage running RAIDZ1 (3+1). And it’s SATA, because it really is a redundant array of independent drives. I do, of course, keep a cold spare I can swap out. Always make sure you you have at least one spare drive of the dimensions of a RAIDZ group, and know where you find it.

And to make it even more fun, you’re booting from the array itself.

After many years I started getting an intermittent CAM error, which isn’t good news. Either one of the drives was loose, or it was failing. And there’s no SAS infrastructure to help. If you’re in a similar position you’ve come to the right place.

WARNING. The examples in this article assume ada1 is the drive that’s failed. Don’t blindly copy/paste into a live system without changing this as appropriate

To change a failed or failing drive:

  • Find the drive
  • Remove the old drive
  • Configure the new drive
  • Tell the RAIDZ to use it

Finding the failed drive

First, identify your failing drive. The console message will probably tell you which one. ZFS won’t, unless it’s failed to the extent it’s been offlined. “zpool status” may tell you everything’s okay, but the console may be telling you:

Feb  2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): RES: 41 40 b0 71 20 00 f6 00 00 00 01
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): Retrying command, 0 more tries remain
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 d8 70 20 40 f6 00 00 01 00 00
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): RES: 41 40 b0 71 20 00 f6 00 00 00 01
Feb 2 17:56:23 zfs1 kernel: (ada1:ahcich1:0:0:0): Error 5, Retries exhausted

So this tells you that ada1 is in a bad way. But which is ada1? It might be the second one across in your enclosure, or it might not. You’ll need to do some hunting to identify it positively.

Luckily most disks have their serial number printed on the label, and this is available to the host. So finding the serial number for ada1 and matching it to the disk label is the best way – if you’ve only got four drives to check, anyway.

I know of at least five ways to get a disk serial number in FreeBSD, and I’ll list them all in case one stops working:

dmesg

Just grep for the drive name you’re interested in (ada1). This is probably a good idea as it may give you more information about the failure. If FreeBSD can get the serial number it will display it as it enumerates the drive.

geom disk list

This will print out information on all the geoms (i.e. drives), including the serial number as “ident”

camcontrol identify ada1

This gives you more information that you ever wanted about a particular drive. This does include the serial number.

diskinfo -s /dev/ada1

This simply prints the ident for the drive in question. You can specify multiple arguments so diskinfo -s /dev/ada? works (up to ten drives).

smartctl -i /dev/ada1

Smartctl is a utility for managing SATA drives (not SAS!), and you should probably install it. It’s part of Smartmontools as it gives you the ATA information for drive, including errors rates, current temperature and suchlike.

Whichever method works for you, once you’ve got your serial number you can identify the drive. Except, of course, if your drive is completely fubared. In that case get the serial numbers of the drives that aren’t and identify it by elimination.

Saving the partition table from the failed drive.

In readiness for replacing it, if you can save its partition table:

gpart backup ada1 >
gpart.ada1

If you can’t read it, just save another one from a different drive in the vdev set – they should be identical, right?

Swapping the bad drive out.

Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

Next, pull your faulty drive and replace it with a new one. You might want to turn the power off, although it’s not necessary. However, it’s probably safer to reboot as we’re messing with the boot array.

Try zpool status, and you’ll see something like this:

  pool: zrpool: zr
state: DEGRADED
status: One or more devices could not be opened.
Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
scan: scrub in progress since Sun Feb 2 17:42:38 2025
356G scanned at 219/s, 175G issued at 108/s, 10.4T total
0 repaired, 1.65% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
  zr DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ada0p3 ONLINE 0 0 0
  16639665213947936 UNAVAIL 0 0 0 was /dev/ada1p3
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0

It’s complaining because it can’t find the drive with the identity 16639665213947936. ZFS doesn’t care where the drives in a vdev are plugged in, only that they exist somewhere. Device ada1 is ignored – it’s just got some random disk in that zfs isn’t interested in.

Setting up the replacement drive

So let’s get things ready to insert the new drive in the RAIDZ.

First restore its partition table:

gpart restore
/dev/ada1 < gpart.ada1

If you see “gpart: geom ‘ada1’: File exists”, just run “gpart destroy -F ada1”. Without the -F it may say the drive is in use, which we know it isn’t.

Next, if you’ve got a scrub going on, stop it with “zpool scrub -s zr”

As a sanity check, run “gpart show” and you should see four identical drives.

Boot sector and insertion

Now this is a boot from ZFS situation, common on a home server but not a big one. The guides from Solaris won’t tell you about this step. To make sure the system boots you need to have the boot sector on every drive (ideally). Do this with:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

And Finally… tell ZFS to insert the new drive:

 zpool replace ada1p3

Run “zpool status” and you’ll see it working:

pool: zr
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Feb 2 18:44:58 2025
1.83T scanned at 429M/s, 1.60T issued at 375M/s, 10.4T total
392G resilvered, 15.45% done, 0 days 06:48:34 to go
config:
NAME STATE READ WRITE CKSUM
zr DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ada0p3 ONLINE 0 0 0
replacing-1 UNAVAIL 0 0 0
16639665213947936 UNAVAIL 0 0 0 was /dev/ada1p3/old
ada1p3 ONLINE 0 0 0
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0
errors: No known data errors


It’ll chug along in the background re-silvering the whole thing. You can carry on using the system, but it’s performance may be degraded until it’s done. Take a look at the console to make sure there are no CAM errors indicating that the problem wasn’t the drive at all, and go to bed.
If you reboot or have a power cut while its rebuilding it will start from scratch, so try to avoid both!

In the morning, zpool status will return to this, and all will be well in the world. But don’t forget to order another cold spare so you’re ready when it happens again.

pool: zr
state: ONLINE
scan: resilvered 2.47T in 0 days 11:52:45 with 0 errors on Mon Feb 3 06:37:43 2025
config:
NAME STATE READ WRITE CKSUM
zr ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0p3 ONLINE 0 0 0
ada1p3 ONLINE 0 0 0
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0
errors: No known data errors

As a final tip, if you use diskinfo -v adaX it will tell you the type of drive and other information, which is really handy if you’re ordering another cold spare.

# diskinfo -v adaa
ada2
512 # sectorsize
3000592982016 # mediasize in bytes (2.7T)
5860533168 # mediasize in sectors
4096 # stripesize
0 # stripeoffset
5814021 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
HGST HUS724030ALE640 # Disk descr. <- Drive to order on eBay!
PK2234P9GWJ42Y # Disk ident.
No # TRIM/UNMAP support
7200 # Rotation rate in RPM
Not_Zoned # Zone Mode


How hot is your FreeBSD machine?

FreeBSD may be the hottest operating system available, but having hot hardware isn’t so good. Modern drives and CPUs can report their temperature, but it’s not easy to see it.

I’ve produced an example script that will report the temperature of whatever it can, with the idea that it can form the basis of whatever you really need to do. I have a multi-server monitoring system using the same methods.

Getting the CPU core temperature is a simple sysctl variable, but it only appears if you have the coretemp.ko module loaded or compiled into the kernel. But coretemp.ko only works for Intel processors; for AMD you need amdtemp.ko instead.

The script tries to determine the CPU type the best way I know how, and loads the appropriate module if necessary. If it loads the module, it unloads the module at the end to leave things as it found them. You can omit the unload so it’s loaded on first, or put the module permanently in loader.conf if you prefer. But this is only an example, and I prefer to keep my scripts independent of host configuration.

Next you need a way to get the temperature from all your drives. This isn’t built in to FreeBSD but you can use the excellent the excellent smartmontools https://www.smartmontools.org by Bruce Allen and Christian Franke.

This was originally intended to access the SMART reporting on ATA drives, but will now extract information for SAS units too. To smartmontools, and the “smartctl” utility in particular, you can build from ports with:

cd /usr/ports/sysutils/smartmontools
make install clean

You can also install it as a binary package with: “pkg install smartmontools”

The script tries to enumerate the drives and CPUs on the system. ATA drives are in /dev/ and begin “ada”, SCSI and USB drives begin “da”. The trick is to figure out which devices are drives and which are partitions or slices within a drive – I don’t have a perfect method.

SCSI drives that aren’t disks (i.e. tape) start “sa”, and return some really weird stuff when smartctl queries them. I’ve tested the script with standard LTO tape drives, but you’ll probably need to tweak it for other things. (Do let me know).

Figuring out the CPUs is tricky, as discrete CPUs and cores within a single chip appear the same. The script simply goes on the cores, which results in the same temperature being reported for each.

You can override any of the enumeration by simply assigning the appropriate devices to a list, but where’s the fun? Seriously, this example shows how you can enumerate devices and it’s useful when you’re monitoring disparate hosts using the same script.

Finally, there are three loops that read the temperature for each device type into into “temp” and then print it. Do whatever you want – call “shutdown -p now” if you think something’s too hot; autodial the fire brigade or, as I do, send yourself an email.

The Script

#!/bin/sh
# FreeBSD Temperature monitoring example script
# (c) FJL 2024 frank@fjl.co.uk
# Please feel free to use this as an example for a few techniques
# including enumerating devices on a host and extracting temperature
# information.

# Full path to utilities in case run with no PATH set
GREP=/usr/bin/grep SMARTCTL=/usr/local/sbin/smartctl
CUT=/usr/bin/cut
SYSCTL=/sbin/sysctl

# Load the AMD CPU monitoring driver if necessary
if [ ! $($SYSCTL -n dev.cpu.0.temperature 2>/dev/null) ]
then
# Let's try to find out if we have Intel or
# AMD processor and select correct kernel module
if $SYSCTL -n hw.model | $GREP AMD >/dev/null
then
tempmodule=amdtemp
else
tempmodule=coretemp
fi
# Load the CPU temp kernel module
kldload $tempmodule
# Set command to unload it when we're done (optional)
unload="kldunload $tempmodule"
fi
# Enumerate SATA, USB and SAS disks - everything
# in /dev/ starting da or ada, except where there's an s or p in it
disks=$(find /dev -depth 1 \( -name da\* -o -name ada\* \) -and -not -name \*s\* -and -not -name \*p\* | cut -c 6- | sort )

# Enumerate other SCSI devices, starting in sa.
# Normally tape drives. May need tweaking!
scsis=$(find /dev -depth 1 \( -name sa\* \) -and -not -name \*.ctl | cut -c 6- | sort)

# Enumerate the CPUs
cpus="$(seq 0 $(expr $($SYSCTL -n hw.ncpu ) - 1))"

# Print all the ATA disks
for disk in $disks
do
temp=$($SMARTCTL -a /dev/$disk | $GREP Temperature_Celsius | $CUT -w -f 10)
echo "$disk: ${temp}C" done

# Print all the SCSI devices (e.g. tapes)
# NB. This will probably take a lot of fiddling as SCSI units return all sorts to smartctl
# Note the -T verypermissive. see man smartctl for details.
for scsi in $scsis
do
temp=$($SMARTCTL -a -T verypermissive /dev/$scsi | $GREP "Current Drive Temperature" | cut -w -f 4)
echo "$scsi: ${temp}C"
done

# Print all the CPUs
for cpu in $cpus
do
temp=$($SYSCTL -n dev.cpu.$cpu.temperature | $CUT -f 1 -d .)
echo "CPU$cpu: ${temp}C"
done

# Unload the CPU temp kernel module if we loaded it (optional)
$unload

Example Output

ada0: 35C
ada1: 39C
ada2: 39C
ada3: 38C
da0: 24C
da1: 25C
sa0: 33C
CPU0: 48C
CPU1: 48C

Why people obsess about the ZFS SLOG, but shouldn’t

There are two mysteries things on ZFS that cause a lot of confusion: The ZIL and the SLOG. This article is about what they are and why you should care, or not care about them. But I’ll come to them later. Instead I’ll start with POSIX, and what it says about writing stuff to disk files.

When you write to disk it can either be synchronous or asynchronous. POSIX (Portable Operating System Interface) has requirements for writes through various system calls and specifications.

With an asynchronous write the OS takes the data you give it and returns control to the application immediately, promising to write the data as soon as possible in the background. No delay. With a synchronous write the application won’t get control back until the data is actually written to the disk (or non-volatile storage of some kind). More or less. Actually, POSIX.1-2017 (IEEE Standard 1003.1-2017) doesn’t guarantee it’s written, but that’s the expectation.

You’d want synchronous writes for critical complex files, such as a database, where the internal structure would break if a transaction was only half written, and a database engine needs to know that one write has occurred before making another.

Writes to ZFS can be long and complicated, requiring multiple blocks be updated for a single change. This is how it maintains its very high integrity. However, this means it can take a while to write even the simplest thing, and a synchronous write could take ages (in computer terms).

To get around this, ZFS maintains a ZIL – ZFS Intent Log.

In ZFS, the ZIL primarily serves to ensure the consistency and durability of write operations, particularly for synchronous writes. But it’s not a physical thing; it’s a concept or list. It contains transaction groups that need to be completed in order.

The ZIL can be physically stored in three possible places…

In-Memory (Volatile Storage):

This is the default location. Initially, all write operations are buffered in RAM. This is where they are held before being committed to persistent storage. This kind ofZIL is volatile because it’s not backed by any permanent storage until written to disk.

Volatility doesn’t matter, because ZFS guarantees consistency with transaction groups (TXGs). The power goes off and the in-RAM ZIL is lost, the transactions are never applied; but the file system is in a consistent state.

In-Pool (Persistent Storage):

Without a dedicated log device, the ZIL entries are written to the main storage pool in transaction groups . This happens for both synchronous and asynchronous writes but is more critical for synchronous writes to ensure data integrity in case of system crashes or power failures.

SLOG (Separate Intent Log Device):

For better performance with synchronous writes, you can add a dedicated device to serve as the SLOG. This device is typically a low-latency, high-speed storage like a short-stroked Rapter, enterprise SSD or NVRAM. ZFS writes the log entries before they’re committed to the pool’s main storage.

By storing the pending TXGs on disk, either in the pool or on an SLOG, ZFS can meet the POSIX requirement that the transaction is stored in non-volatile storage before the write returns, and if you’re doing a lot of synchronous writes then storing them on a high-speed SLOG device helps. But only if the SLOG device is substantially faster than an array of standard drives. And it only matters if you do a lot of synchronous writes. Caching asynchronous writes in RAM is always going to be faster still

I’d contend that the only times synchronous writes feature heavily are databases and virtual machine disks. And then there’s NFS, which absolutely loves them. See ESXi NFS ZFS and vfs-nfsd-async for more information if this is your problem.

If you still think yo need an SLOG, install a very fast drive. These days an NVMe SLC NAND device makes sense. Pricy, but it doesn’t need to be very large. You can add it to a zpool with:

zpool add poolname
log /dev/daX

Where daX is the drive name, obviously.

As I mentioned, the SLOG doesn’t need to be large at all. It only has to cope with five seconds of writes, as that’s the maximum amount of time data is “allowed” to reside there. If you’re using NFS over 10Gbit Ethernet the throughput isn’t going to be above 1.25Gb a seconds. Assuming that’s flat-out synchronous writes, multiplying that by five seconds is less than 8Gb. Any more would be unused.

If you’ve got a really critical system you can add mirrored SLOG drives to a pool thus:

zpool add poolname
log /dev/daX /dev/daY

You can also remove them with something like:

zpool remove
poolname log /dev/daY

This may be useful if adding an SLOG doesn’t give you the performance boost you were hoping for. It’s very niche!

FreeBSD 14 ZFS warning

Update 27-Nov-23
Additional information has appeared on the FreeBSD mailing list:
https://lists.freebsd.org/archives/freebsd-stable/2023-November/001726.html

The problem can be reproduced regardless of the block cloning settings, and on FreeBSD 13 as well as 14. It’s possible block cloning simply increased the likelihood of hitting it. There’s no word yet about FreeBSD 12, but this FreeBSD’s own ZFS implementation so there’s a chance it’s good

In the post by Ed Maste, a suggested partial workaround is to set the tunable vfs.zfs.dmu_offset_next_sync to zero, which has been on the forums since Saturday. This is a result of this bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275308

There’s a discussion of the issue going on here:

https://forums.freebsd.org/threads/freebsd-sysctl-vfs-zfs-dmu_offset_next_sync-and-openzfs-zfs-issue-15526-errata-notice-freebsd-bug-275308.91136/

I can’t say I’m convinced about any of this.


FreeBSD 14, which was released a couple of days ago, includes OpenZFS 2.2. There’s a lot of suspicion amongst Gentoo Linux users that this has a rather nasty bug in it related to block cloning.

Although this feature is disabled by default, people might be tempted to turn it on. Don’t. Apparently it can lead to lost data.

OpenZFS 2.2.0 was only promoted to stable on 13th October, and in hindsight adding it to a FreeBSD release so soon may seem precipitous. Although there’s a 2.2.1 release you should now be using instead it simply disables it by default rather than fixing the likely bug (and to reiterate, the default is off on FreeBSD 14).

Earlier releases of OpenZFS (2.1.x or earlier) are unaffected as they don’t support block cloning anyway.

Personally I’ll be steering clear of 2.2 until this has been properly resolved. I haven’t seen conclusive proof as to what’s causing the corruption, although it looks highly suspect. Neither have I seen or heard of it affecting the FreeBSD implementation, but it’s not worth the risk.

Having got the warning out of the way, you may be wondering what block cloning is. Firstly, it’s not dataset cloning. That’s been working fine for years, and for some applications it’s just what’s needed.

Block cloning applies to files, not datasets, and it’s pretty neat – or will be. Basically, when you copy a file ZFS doesn’t actually copy the data blocks – it just creates a new file in the directory structure but it points to the existing blocks. They’re shared between the source and destination files. Each block has a reference count in the on-disk Block Reference Table (BRT), and only when a block in the new file changes does a copy-on-write occur; the new block is linked to the new file and the reference count in the BRT is decremented. In familiar Unix fashion, when the reference count for a block gets to zero it joins the free pool.

This isn’t completely automatic – it must be allowed when the copy is made. For example, the cp utility will request clone files by default. This is done using the copy_file_range() system call with the appropriate runes; simply copying a file with open(), read(), write() and close() won’t be affected.

As of BSDCAN 2023, there was talk about making it work with zvols but this was to come later, although clone blocks in files can exist between datasets as long as they’re using the same encryption (including keys).

One tricky problem here is how it works with the ZIL – for example what’s stopping a block pointer from disappearing from the log? There was a lot to go wrong, and it looks like it has.

Release notes for 2.2.1 may be found here.
https://github.com/openzfs/zfs/releases/tag/zfs-2.2.1

Quicken doesn’t like Amex QIF format

If you’re trying to import data into Quicken 98 (or compatible) having downloaded it from the American Express web site you’ll get some odd and undesirable results. There are two incompatibilities.

Firstly the date field (D) needs to have a two-digit year, whereas Amex adds the century.

Secondly, for reasons I haven’t quite figured out yet, the extra information M field sometimes results is a blank entry (other than the date).

You can fix this by editing the QIF file using an editor of your choice, but this gets tedious so here’s a simple program that will do it for you. It reads from stdin and writes to stdout so you’ll probably use it in a script or redirect it from and to a file. I compile it to “amexqif”. If there’s any interest I’ll tweak it to make it more friendly.

It’s hardly a complex program, the trick was figuring out what’s wrong in the first place.

#include <stdio.h>
#include <string.h>

char buf [200];

int main()
{
int lenread;
char *str;
        while ((str = fgets(buf, 200,stdin))) // (()) to placate idiot mode on CLANG
        {
                if ((lenread = strlen(str)) > 1)        // Something other than just the newline
                {
                        if (str[0] == 'M')
                                continue;       // Random stuff in M fields breaks Q98


                        if (str[0] == 'D' && lenread == 12)     // Years in dates must be two digit
                                strcpy (str+7, str+9);
                }
                printf("%s",str);
        }
}

Proper Case in a shell script

How do you force a string into proper case in a Unix shell script? (That is to say, capitalise the first letter and make the rest lower case). Bash4 has a special feature for doing it, but I’d avoid using it because, well, I want to be Unix/POSIX compatible.

It’s actually very easy once you’ve realised tr won’t do it all for you. The tr utility has no concept on where in the input stream it is, but combining tr with cut works a treat.

I came across this problem when I was writing a few lines to automatically create directory layouts for interpreted languages (in this case the Laminas framework). Languages of this type like capitalisation of class names, but other names have be lower case.

Before I get started, I note about expressing character ranges in tr. Unfortunately different systems have done it in different ways. The following examples assume BSD Unix (and POSIX). Unix System V required ranges to be in square brackets – e.g. A-Z becomes “[A-Z]”. And the quotes are absolutely necessary to stop the shell globing once you’ve introduced the square brackets!

Also, if you’re using a strange character set, consider using \[:lower:\] and \[:upper:\] instead of A-Z if your version of tr supports it (most do). It’s more compatible with foreign character sets although I’d argue it’s not so easy on the eye!

Anyway, these examples use A-Z to specify ASCII characters 0x41 to 0x5A – adjust to suit your tr if your Unix is really old.

To convert a string ($1) into lower case, use this:

lower=$(echo $1 | tr A-Z a-z)

To convert it into upper case, use the reverse:

upper=$(echo $1 | tr a-z A-Z)

To capitalise the first letter and force the rest to lower case, split using cut and force the first character to be upper and the rest lower:

proper=$(echo $1 | cut -c 1 | tr a-z A-Z)$(echo $1 | cut -c 2- | tr A-Z a-z)

A safer version would be:

proper=$(echo $1 | cut -c 1 | tr "[:lower:]" "[:upper:]")$(echo $1 | cut -c 2- | tr "[:upper:]" [":lower:"])

This is tested on FreeBSD in /bin/sh, but should work on all BSD and bash-based Linux systems using international character sets.

You could, if you wanted to, use sed to split up a multi-word string and change each word to proper case, but I’ll leave that as an exercise to the reader.

Cookie problems

It’s been ten years since the original EU ePrivacy Directive (Regulation of the European Parliament and of the Council concerning the respect for private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC) came into effect in the UK. It’s implement as part of the equally wordy Privacy and Electronic Communications Regulations 2012 (PECR) One thing it did was require companies with websites to allow users to opt in to cookies. I’ve written about this before, but since then the amount of tracking cookies has become insane, and most people would choose to opt out of that much survailance.

The problem is that with so many cookies, some websites have effectively circumvent the law by making it impractical to opt out of all of them. At first glance they offer an easy way to turn everything off and “save settings”, but what’s not so clear is that they hide an individual option for every tracking cookie company they have a deal with. And there can be hundreds, each of which needs to be individually switched off using a mouse.

These extra cookies – and they’re the ones you don’t want – are usually hidden behind a “vendors” or “partners” tab. With the site shown below this was only found by scrolling all the way down and clicking on a link.

This kind of thing is not in the spirit of the act, and web sites that do this do not “care about your privacy” in any way, shape or form. And if you think these opt-in/out forms look similar, it’s because they are. Consent Management Platforms like Didomi, OneTrust and Quantcast are widely used to either set, or obfuscate what you’re agreeing to.

An update to the ePrivicy directive is now being talked about that says it must be “easy” to reject these tracking cookies, which isn’t the case now.

Meanwhile some governments are cracking down. In January, Google and Facebook both got slapped with huge fines in France from the
Commission Nationale de l’Informatique et des Libertés, which reckoned that because it took more clicks to reject cookies than accept them, Google and Facebook were not playing fair.

“Several clicks are required to refuse all cookies, against a single one to accept them. [The committee] considered that this process affects the freedom of consent: since, on the internet, the user expects to be able to quickly consult a website, the fact that they cannot refuse the cookies as easily as they can accept them influences their choice in favor of consent. This constitutes an infringement of Article 82 of the French Data Protection Act.”

I’m inclined to agree. And on top of a fines of €150 and €60 respectively, they’re being hit with €100K for each extra day they allow the situation to remain.

Unfortunately we’re not likely to see this in the UK. The EU can’t actually agree on the final form of the ePrivacy regulations. The UK, of course, is no longer in the EU and may be able to pass its own laws while the EU argues.

Last year the Information Commissioner’s office did start on this, taking proposals to the G7 in September 2021. Elizabeth Denham told the meeting that a popup with a button saying “I Agree” wasn’t exactly informed consent.

However, this year the government is going in the other direction. It’s just published plans to do away with the “cookie popup” and allow websites to slurp your data. Instead sites will be required to give users clear information on how to opt out. This doesn’t fill me with confidence.

Also in the proposals is to scrap the need for a Data Protection Impact Assessment (DPIA) before slurping data, replacing it with a “risk-based privacy management programme to mitigate the potential risk of protected characteristics not being identified”.

I don’t like the idea of any of this. However, there’s a better solution – simply use a web browser that rejects these tracking cookies in the first place.

Reply-To: gmail spam and Spamassassin

Over the last few months I’ve noticed huge increase is spam with a “Reply To:” field set to a gmail address. What the miscreants are doing is hijacking a legitimate mail server (usually a Microsoft one) and pumping out spam advertising a service of some kind. These missives only work if the mark is able to reply, and as even a Microsoft server will be locked down sooner or later, so they’ll never get the reply.

The reason for sending this way is, of course, spam from a legitimate mail server isn’t going to be blacklisted or blocked. SPF and other flags will be good. So these spams are likely to land in inboxes, and a few marks will reply based on the law of numbers.

To get the reply they’re using the email “Reply-To:” field, which will direct the reply to an alternative address – one which Google is happy to supply them for nothing.

The obvious way of detecting this would be to examine the Reply-To: field, and if it’s gmail whereas the original sender isn’t, flag it as highly suspect.

I was about to write a Spamassassin rule to do just this, when I discovered there is one already – and it’s always been there. The original idea came from Henrik Krohns in 2009, but it’s time has now definitely arrived. However, in a default install, it’s not enabled – and for a good reason (see later). The rule you want is FREEMAIL_FORGED_REPLYTO, and it’s found in 20_freemail.cf

Enabling FREEMAIL_FORGED_REPLYTO in Spamassassin

If you check 20_freemail.cf you’ll see the rules require Mail::SpamAssassin::Plugin::FreeMail, The FreeMail.pm plugin is part of the standard install, but it’s very likely disabled. To enable this (or any other plugin) edit the init.pre file in /usr/local/etc/mail/spamassassin/ Just add the following to the end of the file:

# Freemail checks
#
loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm

You’ll then need to add a list of what you consider to be freemail accounts in your local.cf (/usr/local/etc/mail/spamassassin/local.cf). As an example:

freemail_domains aol.* gmail.* gmail.*.* outlook.com hotmail.* hotmail.*.*

Note the use of ‘*’ as a wildcard. ‘?’ matches a single character, but neither match a ‘.’. It’s not a regex! There’s also a local.cf setting “freemail_whitelist”, and other things documented in FreeMail.pm.

Then restart spamd (FreeBSD: service spamd restart) and you’re away. Except…

The problem with this Rule

If you look at 20_freemail.cf you’ll see the weighting is very low (currently 0.1). If this is such a good rule, why so little? The fact is that there’s a lot of spam appearing in this form, and it’s the best heuristic for detecting it, but it’s also going to lead to false positives in some cases.

Consider those silly “contact forms” beloved by PHP Web Developers. They send an email from a web server but with a “faked” reply address to the person filling in the form. This becomes indistinguishable from the heuristic used to spot the spammers.

If you know this is going to happen you can, of course add an exception. You can even have the web site use a local submission port and send it to a local mailbox without filtering. But in a commercial hosting environment this gets a bit complicated – you don’t know what Web Developers are doing. (How could you? They often don’t).

If you have control over your users, it’s probably safe to up the weighting. I’d say 3.0 is a good starting point. But it may be safer to leave it at 0.1 and examine the results for what would have been false positives.

Nothing new with Intel SDSi

Intel’s latest wheeze for its CPUs is Software Defined Silicone (SDSi). The deal is that you buy the CPU at one price and then pay extra for a license to enable more stuff.

If you want the geeky stuff about how it’s supposed to work in Linux, see here. https://github.com/intel/intel-sdsi

Basically, the CPU has an interface that you can access if you have an Authentication Key Certificate (AKC) and have purchased a Capability Activation Payload (CAP) code. This will then enable extra stuff that was previously disabled. Quite what the extra stuff is remains to be seen – it could be extra instructions or enabling extra cores on a multi-core chip, or enabling more of the cache. In other words, you buy extra hardware that’s disabled, and pay extra to use it. What’s even more chilling is that you could be continuously paying licenses for the hardware you’ve bought or it’ll stop working.

It’s not actually defining the silicone in software like a FPGA, as you’d expect from euphemistic name. Software Defined Uncrippling would be more honest, but a harder sell.

But this is nothing new. I remember IBM doing this with disk drives in the 1970’s. If you upgraded your drive to double the capacity an IBM tech turned up and removed a jumper, enabling the remaining cylinders. Their justification was that double the capacity meant double the support risk – and this stuff was leased.

Fast forward 20 years to Intel CPUS. Before the Intel 80486 chips you could provide whatever input clock you wanted to your 80386, just choosing how fast it went. Intel would guarantee the chip to run at a certain speed, but that was the only limiting factor. Exceed this speed at your own risk.

The thing was that the fast and slow CPUs were theoretically identical. It’s often the case with electronic components. However, manufacturing tolerances mean that not all components end up being the same, so they’re batch tested when the come off the line. Those that pass the toughest test get stamped with a higher speed and go in the fast bucket, where they’re sold for more. Those that work just fine at a lower speed go into the slower bucket and sell for less. Fair enough. Except…

It’s also the nature of chip manufacture that the process improves over time, so more of the output meets the higher test – eventually every chip is a winner. You don’t get any of the early-run slow chips, but you’re contracted to sell them anyway. The answer is to throw some of the fast chips into the slow bucket and sell them cheap, whilst selling others at premium price to maintain your margins.

In the early 1990’s I wrote several articles about how to take advantage of this in PCW, after real-world testing of many CPUs. It later became known as overclocking. I also took the matter up with Intel at the time, and they explained that their pricing had nothing to do with manufacturing costs, and everything to do with supply and demand. Fair enough – they were honest about it. This is why AMD gives you more bang-per-buck – they choose to make things slightly better and cheaper because that maximises their profits too.

With the introduction of the 80486, the CPU clock speed was set in the package so the chip would only run at the speed you paid for. SDSi is similar, except you can adjust the setting by paying more at a later date. It also makes technical sense – producing large quantities of just one chip has huge economies of scale. The yield improves, and you just keep the fab working. In order to have a product range you simply knobble some chips to make them less desirable. And using software to knobble them is the ultimate, as you can decide at the very last minute how much you want to sell the chip for, long after it’s packaged and has left the factory.

All good? Well not by me. This only works if you’re in a near monopoly position in the first place. Microsoft scalps its customers with licenses and residual income, and Intel wants in on that game. It’s nothing about being best, it’s about holding your customers to ransom for buying into your tech in the first place. This hasn’t hurt Microsoft’s bottom line, and I doubt it’ll hurt Intel’s either.

STARTTLS is not a protocol

As regular readers will know, I’m not a fan of STARTTLS but today I realised that some people are confused as to what it even means. And there’s a perfectly good reason for this – some graphical email software is actually listing STARTTLS as a protocol for talking to mail servers and people are jumping to conclusions.

So what is STARTTLS all about if you go back to basics?

Originally, when only nice people had access to computers, network traffic was unencrypted. If you had physical access to the network you could pretty much read anything you wanted to, as everything connected to the same network saw the same data. This isn’t true now, but encryption you data is a good idea just in case it can be intercepted – and if it’s going over the Internet that’s definitely the case.

In the mid 1990s, the original mass-market web browser, Netscape, decided to do something about it and they (or more specifically their chief scientist Taher Elgamal, invented a protocol called Secure Sockets Layer (SSL) to protect HTTP (web) traffic. Actually, several times as the first couple of attempts weren’t very secure at all.

SSL didn’t really fit in with the OSI model; it runs on top of the transport protocol (usually TCP) but under the presentation layer, which would logically handle encryption but doesn’t usually. To use it you need an SSL layer added to the stack to transparently do the deed on a particular port.

But, as a solution to the encryption problem, SSL took off and pretty much every major protocol has an SSL port along with its original cleartext one. So clear HTTP is on port 80, HTTPS is on port 443. Clear POP3 is on port 110, encrypted on 995. Clear IMAP is on port 143, encrypted on 993.

As is the way of genius ideas in cybersecurity, even the third version of SSL was found to be full of holes. SSL version 3.1, which was renamed TLS, continued plugging the leaks and by TLS 1.2 it’s considered pretty much secure now. TLS 1.3, which interoperates with TLS 1.2, simply deprecates certain cyphers and hashes on the suspicion they might be insecure; although anyone into cybersecurity should tell you that everything is secure only until it’s broken.

Unfortunately, because different levels of TLS use different cyphers and reject others, TLS levels are by no means interoperable. And neither is it the case that a newer version is more secure; bugs have been introduced and later fixed. This’d be fine if everything and everyone used the same version of TLS, but in the real world this isn’t practical – old hardware, in particular, bakes in old versions of SSL or TLS and if you decided to deprecate older cyphers and not work with them, you loose the ability to talk to your hardware.

But apart from this, things were going along pretty well; and then someone had the bright idea of operating encrypted and unencrypted connections on the same port by hacking it at the application layer instead. This was achieved by modifying the application protocol to include a STARTTLS command. If this is received, the application then negotiates a TLS connection. If the receiving host didn’t understand what STARTTLS meant it’d send back and error, and things could continue unencrypted.

In other words, if you’re implementing an SMTP server with STARTTLS, this keyword is added to the protocol and the SMTP server does something about it when it sees it.

What could go wrong?

Well quite a lot of things, actually. Because TLS doesn’t fit in to the OSI model, it’s actually very difficult to deal with the situation where a TLS connection is requested and agreed to but the TLS layer fails to agree on a cypher with an older or newer version on the other end. There’s no mechanism for passing this to the application to say “okay, let’s revert to Plan A”, and the connection tends to hang.

There’s also a problem with name-based virtual servers must all use the same host certificate because the TLS connection must be established before the application layer headers are transferred.

But perhaps my biggest gripe is that enabling STARTTLS makes encryption optional. You’re not enforcing encryption when you need to, and even if you think you are, STARTTLS connection are obviously vulnerable to a man-in-the-middle attack. You have no idea how many times TLS has been turned on and off between the two endpoints.

You might be tempted to think that optional encryption is better than none at all, but in reality it means you don’t care – and if you don’t care, don’t bother. It just leads to a false sense of security. And it can lead to interoperability problems. My advice is to use “always TLS” ports for sensitive data and turn off the old port.