mysql/mariadb error 35

I’ve had a problem with mysql failing to start with error 35 in the log file:

InnoDB: Unable to lock ./ibdata1, error: 35
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.

What to do? Google and you get a lot of Linux people saying that the answer is to reboot the box. Hmm. Well you don’t have to.

What causes the error is mysqld crashing, usually when system resources are exhausted. Rebooting will, indeed, unlock ibdata1 but so will killing the process that locks it. Yet the server isn’t running, so how can this be? Well actually part of it is – just not the part the service manager sees.

Run “ps -auxww | grep mysql” and you’ll find a few more. Send them a kill, wait for it to work and then restart. Obviously you can only do this and expect it to work if you’ve sorted out the resource problem.

FreeBSD 14 ZFS warning

Update 27-Nov-23
Additional information has appeared on the FreeBSD mailing list:
https://lists.freebsd.org/archives/freebsd-stable/2023-November/001726.html

The problem can be reproduced regardless of the block cloning settings, and on FreeBSD 13 as well as 14. It’s possible block cloning simply increased the likelihood of hitting it. There’s no word yet about FreeBSD 12, but this FreeBSD’s own ZFS implementation so there’s a chance it’s good

In the post by Ed Maste, a suggested partial workaround is to set the tunable vfs.zfs.dmu_offset_next_sync to zero, which has been on the forums since Saturday. This is a result of this bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275308

There’s a discussion of the issue going on here:

https://forums.freebsd.org/threads/freebsd-sysctl-vfs-zfs-dmu_offset_next_sync-and-openzfs-zfs-issue-15526-errata-notice-freebsd-bug-275308.91136/

I can’t say I’m convinced about any of this.


FreeBSD 14, which was released a couple of days ago, includes OpenZFS 2.2. There’s a lot of suspicion amongst Gentoo Linux users that this has a rather nasty bug in it related to block cloning.

Although this feature is disabled by default, people might be tempted to turn it on. Don’t. Apparently it can lead to lost data.

OpenZFS 2.2.0 was only promoted to stable on 13th October, and in hindsight adding it to a FreeBSD release so soon may seem precipitous. Although there’s a 2.2.1 release you should now be using instead it simply disables it by default rather than fixing the likely bug (and to reiterate, the default is off on FreeBSD 14).

Earlier releases of OpenZFS (2.1.x or earlier) are unaffected as they don’t support block cloning anyway.

Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

Personally I’ll be steering clear of 2.2 until this has been properly resolved. I haven’t seen conclusive proof as to what’s causing the corruption, although it looks highly suspect. Neither have I seen or heard of it affecting the FreeBSD implementation, but it’s not worth the risk.

Having got the warning out of the way, you may be wondering what block cloning is. Firstly, it’s not dataset cloning. That’s been working fine for years, and for some applications it’s just what’s needed.

Block cloning applies to files, not datasets, and it’s pretty neat – or will be. Basically, when you copy a file ZFS doesn’t actually copy the data blocks – it just creates a new file in the directory structure but it points to the existing blocks. They’re shared between the source and destination files. Each block has a reference count in the on-disk Block Reference Table (BRT), and only when a block in the new file changes does a copy-on-write occur; the new block is linked to the new file and the reference count in the BRT is decremented. In familiar Unix fashion, when the reference count for a block gets to zero it joins the free pool.

This isn’t completely automatic – it must be allowed when the copy is made. For example, the cp utility will request clone files by default. This is done using the copy_file_range() system call with the appropriate runes; simply copying a file with open(), read(), write() and close() won’t be affected.

As of BSDCAN 2023, there was talk about making it work with zvols but this was to come later, although clone blocks in files can exist between datasets as long as they’re using the same encryption (including keys).

One tricky problem here is how it works with the ZIL – for example what’s stopping a block pointer from disappearing from the log? There was a lot to go wrong, and it looks like it has.

Release notes for 2.2.1 may be found here.
https://github.com/openzfs/zfs/releases/tag/zfs-2.2.1

Using ddrescue to recover data from a USB flash drive

If you’re in the data recovery, forensics or just storage maintenance business (including as an amateur) you probably already know about ddrescue. Released about twenty years ago by Antonio Diaz Diaz, it was a big improvement over the original concept dd_rescue from Kurt Garloff in 1999. They copy disk images (which are just files in Unix) trying to get as much data extracted when the drive itself has faults.

If you’re using Windows rather than Unix/Linux then you probably want to get someone else to recover your data. This article assumes FreeBSD.

The advantage of using either of these over dd or cp is that they expect to find bad blocks in a device and can retry or skip over them. File copy utilities like dd ignore errors and continue, and cp will just stop. ddrescue is particularly good at retrying failed blocks, and reducing the block size to recover every last readable scrap – and it treats mechanical drives that are on their last legs as gently as possible.

If you’re new to it, the manual for ddrescue can be found here. https://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html

However, for most use cases the command is simple. Assuming the device you want to copy is /dev/da1 and you’re calling it thumbdrive the command would be:

ddrescue /dev/da1
thumbdrive.img thumbdrive.map

The device data would be stored in thumbdrive.img, with ongoing state information stored in thumbdrive.map. This state information is important, as it allows ddrescue to pick up where it left off.

However, ddrescue was written before USB flash drives (pen drives, thumb drives or whatever). That’s not to say it doesn’t work, but they have a few foibles of their own. It’s still good enough that I haven’t modified ddrescue base code to cope, but by using a bit of a shell script to do the necessary.

USB flash drives seem to fail in a different way to Winchester disks. If a block of Flash EPROM can’t be read it’s going to produce a read error – fair enough. But they have complex management software running on them that attempts to make Flash EPROM look like a disk drive, and this isn’t always that great in failure mode. In fact I’ve found plenty of examples where they come across a fault and crash rather than returning an error, meaning you have to turn them off and on to get anything going again (i.e. unplug them and put them back in).

So it doesn’t matter how clever ddrescue is – if it hits a bad block and the USB drive controller crashes the it’s going to be waiting forever for a response and you’ll just have come reset everything manually and resume. One of the great features of ddrescue is that it can be stopped and restarted at any time, so continuing after this happens is “built in”.

In reality you’re going to end up unplugging your USB flash drive many times during recovery. But fortunately, it is possible to turn a USB device off and on again without unplugging it using software. Most USB hardware has software control over its power output, and it’s particularly easy on operating systems like FreeBSD to do this from within a shell script. But first you have to figure out what’s where in the device map – specifically which device represents your USB drive in /dev and which USB device it is on the system. Unfortunately I can’t find a way of determining it automatically, even on FreeBSD. Here’s how you do it manually; if you’re using a version of Linux it’ll be similar.

When you plug a USB storage device into the system it will appear as /dev/da0 for the first one; /dev/da1 for the second and so on. You can read/write to this device like a file. Normally you’d mount it so you can read the files stored on it, but for data recovery this isn’t necessary.

So how do you know which /dev/da## is your media? This easy way to tell is that it’ll appear on the console when you first plug it in. If you don’t have access to the console it’ll be in /var/log/messages. You’ll see something like this.

Jun 10 17:54:24 datarec kernel: umass0 on uhub5
kernel: umass0: <vendor 0x13fe USB DISK 3.0, class 0/0, rev 2.10/1.00, addr 2> on usbus1
kernel: umass0 on uhub5
kernel: umass0: on usbus1
kernel: umass0: SCSI over Bulk-Only; quirks = 0x8100
kernel: umass0:7:0: Attached to scbus7
kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
< USB DISK 3.0 PMAP> Removable Direct Access SPC-4 SCSI device
kernel: da0: Serial Number 070B7126D1170F34
kernel: da0: 40.000MB/s transfers
kernel: da0: 59088MB (121012224 512 byte sectors)
kernel: da0: quirks=0x3
kernel: da0: Write Protected

So this is telling us that it’s da0 (i.e /dev/da0)

The hardware identification is “<vendor 0x13fe USB DISK 3.0, class 0/0, rev 2.10/1.00, addr 2> on usbus1” which means it’s on USB bus 1, address 2.

You can confirm this using the usbconfig utility with no arguments:

ugen5.1:  at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
...snip...
ugen1.1: at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen1.2: at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (300mA)

There it is again, last line.

usbconfig has lots of useful commands, but the ones we’re interested are power_off and power_on. No prizes for guessing what they do. However, unless you specify a target then it’ll switch off every USB device on the system – including your keyboard, probably.

There are two ways of specifying the target, but I’m using the -d method. We’re after device 1.2 so the target is -d 1.2

Try it and make sure you can turn your USB device off and on again. You’ll have to wait for it to come back online, of course.

There are ways of doing this on Linux by installing extra utilities such as hub-ctrl. You may also be able to do it by writing stuff to /sys/bus/usb/devices/usb#/power/level” – see the manual that came with your favourite Linux distro.

The next thing we need to do is provide an option for ddrescue so that it actually times out if the memory stick crashes. The default is to wait forever. The –timeout=25 or -T 25 option (depending on your optional taste) sees to that, making it exit if it hasn’t been able to read anything for 25 seconds. This isn’t entirely what we’re after, as a failed read would also indicate that the drive hadn’t crashed. Unfortunately there’s no such tweak for ddrescue, but failed reads tend to be quick so you’d expect a good read within a reasonable time anyway.

So as an example of putting it all into action, here’s a script for recovering a memory stick called duracell (because it’s made by Duracell) on USB bus 1 address 2.

#!/bin/sh
while ! ddrescue -T 25 -u /dev/da0 duracell.img duracell.map
do
echo ddrescue returned $?
usbconfig -d 1.2 power_off
sleep 5
usbconfig -d 1.2 power_on
sleep 15
echo Restarting
done

A few notes on the above. Firstly, ddrescue’s return code isn’t defined. However, it appears to do what one might expect so the above loop will drop out if it ever completes. I’ve set the timeout for time since last good read to 25 seconds, which seems about right. Turning off the power for 5 seconds and then waiting for 15 seconds for the system to recognise it may be a bit long – tune as required. I’m also using the -u option to tell ddrescue to only go forward through the drive as it’s easier to read the status when it’s always incrementing. Going backwards and forwards makes sense with mechanical drives, but not flash memory.

Aficionados of ddrescue might want to consider disabling scraping and/or trimming (probably trimming) but I’ve seen it recover data with both enabled. Data recovery is an art, so tweak away as you see fit – I wanted to keep this example simple.

Now this system isn’t prefect. I’m repurposing ddrescue, which does a fine job on mechanical drives, to recover data from a very different animal. I may well write a special version for USB Flash drives but this method does actually work quite well. Let me know how you get on.

Quicken doesn’t like Amex QIF format

If you’re trying to import data into Quicken 98 (or compatible) having downloaded it from the American Express web site you’ll get some odd and undesirable results. There are two incompatibilities.

Firstly the date field (D) needs to have a two-digit year, whereas Amex adds the century.

Secondly, for reasons I haven’t quite figured out yet, the extra information M field sometimes results is a blank entry (other than the date).

You can fix this by editing the QIF file using an editor of your choice, but this gets tedious so here’s a simple program that will do it for you. It reads from stdin and writes to stdout so you’ll probably use it in a script or redirect it from and to a file. I compile it to “amexqif”. If there’s any interest I’ll tweak it to make it more friendly.

It’s hardly a complex program, the trick was figuring out what’s wrong in the first place.

#include <stdio.h>
#include <string.h>

char buf [200];

int main()
{
int lenread;
char *str;
        while ((str = fgets(buf, 200,stdin))) // (()) to placate idiot mode on CLANG
        {
                if ((lenread = strlen(str)) > 1)        // Something other than just the newline
                {
                        if (str[0] == 'M')
                                continue;       // Random stuff in M fields breaks Q98


                        if (str[0] == 'D' && lenread == 12)     // Years in dates must be two digit
                                strcpy (str+7, str+9);
                }
                printf("%s",str);
        }
}

Having a good argument

I’ve seen all sorts of stuff on forums about how to process command line argument in C or C++. What a load of fuss and bother. There’s a standard getopt() function in the ‘C’ library, similar to the shell programming command, but it’s not great.

The main problem with getopt() is that it produces its own error message. Although this saves you the trouble, it can be a bit user unfriendly for the user, especially when things get complex. For example, you might have mutually exclusive arguments and want to print out a suitable message. That said, it’ll work most of the time.

Here’s a page showing a good explanation for the GCC version, whcih is pretty standard.

Example of Getopt (The GNU C Library)

But rolling your own is not hard. Here’s a skeleton I use. It’s pretty self-explanatory. My rules allow single options (e.g. -a -b) or combined options (e.g. -ab), or any mixture. “–” ends options, meaning subsequent arguments are going to be a actual arguments.

If you want to pass something as an option, such as a filename, you can. -fmyfile or -f myfile are both handled in the example.

You can add code to detect a long option by adding “if (!strcmp(p,”longname”)) … just after the char c. But I don’t like long options.

#include <stdio.h>

void process(char *s)
{
    printf("Processing %s\n",s);
}

int main (int cnt, char **a)
{
    int i;
    for (i=1; i<cnt && a[i][0] == '-'; i++)
    {
        char *p = &a[i][1];
        char c;
        if (*p == '-')
        {
            i++;
            break;
        }
        while (c = *p)
            switch (*p++)
            {
            case 'a':
                printf("Option a\n");
                break;
                
            case 'b':
                printf("Option b\n");
                break;

            case 'f':
                if (!*p && i+1 < cnt)
                    printf("Value for f=%s\n", a[++i]);
                else
                {
                    printf("Value for f=%s\n", p);
                    while (*p)
                        p++;
                }
                break;
                
            default:
                printf("Bad switch %c\n",c);

            }
    }
    for (;i<cnt;i++)
        process(a[i]);
}

The above code assumes that options precede arguments. If you want to mix them the following code allows for complete anarchy – but you can end it using the “–” option, which will take any following flags as arguments. As a bonus it shows how to add a long argument.

#include <stdio.h>
#include <string.h>

void process(char *s)
{
    printf("Processing %s\n",s);
}

int main (int cnt, char **a)
{
    int i;
    int moreargs=1;

    for (i=1; i<cnt; i++)
    {
            if (moreargs && a[i][0] == '-')
            {
            char *p = &a[i][1];
            char c;
                if (*p == '-')
                {
                    moreargs = 0;
                    continue;
                }
                if (!strcmp(p,"long"))
                {
                    printf("Long argument\n");
                    i++;
                    continue;
                }

                while (c = *p)
                    switch (*p++)
                    {
                    case 'a':
                        printf("Option a\n");
                        break;

                    case 'b':
                        printf("Option b\n");
                        break;

                    case 'f':
                        if (!*p && i+1 < cnt)
                            printf("Value for f=%s\n", a[++i]);
                        else
                        {
                            printf("Value for f=%s\n", p);
                            while (*p)
                                p++;
                        }
                        break;

                        default:
                            printf("Bad switch %c\n",c);

            }
        }
    else
        process(a[i]);
    }
}



Proper Case in a shell script

How do you force a string into proper case in a Unix shell script? (That is to say, capitalise the first letter and make the rest lower case). Bash4 has a special feature for doing it, but I’d avoid using it because, well, I want to be Unix/POSIX compatible.

It’s actually very easy once you’ve realised tr won’t do it all for you. The tr utility has no concept on where in the input stream it is, but combining tr with cut works a treat.

I came across this problem when I was writing a few lines to automatically create directory layouts for interpreted languages (in this case the Laminas framework). Languages of this type like capitalisation of class names, but other names have be lower case.

Before I get started, I note about expressing character ranges in tr. Unfortunately different systems have done it in different ways. The following examples assume BSD Unix (and POSIX). Unix System V required ranges to be in square brackets – e.g. A-Z becomes “[A-Z]”. And the quotes are absolutely necessary to stop the shell globing once you’ve introduced the square brackets!

Also, if you’re using a strange character set, consider using \[:lower:\] and \[:upper:\] instead of A-Z if your version of tr supports it (most do). It’s more compatible with foreign character sets although I’d argue it’s not so easy on the eye!

Anyway, these examples use A-Z to specify ASCII characters 0x41 to 0x5A – adjust to suit your tr if your Unix is really old.

To convert a string ($1) into lower case, use this:

lower=$(echo $1 | tr A-Z a-z)

To convert it into upper case, use the reverse:

upper=$(echo $1 | tr a-z A-Z)

To capitalise the first letter and force the rest to lower case, split using cut and force the first character to be upper and the rest lower:

proper=$(echo $1 | cut -c 1 | tr a-z A-Z)$(echo $1 | cut -c 2- | tr A-Z a-z)

A safer version would be:

proper=$(echo $1 | cut -c 1 | tr "[:lower:]" "[:upper:]")$(echo $1 | cut -c 2- | tr "[:upper:]" [":lower:"])

This is tested on FreeBSD in /bin/sh, but should work on all BSD and bash-based Linux systems using international character sets.

You could, if you wanted to, use sed to split up a multi-word string and change each word to proper case, but I’ll leave that as an exercise to the reader.

Cookie problems

It’s been ten years since the original EU ePrivacy Directive (Regulation of the European Parliament and of the Council concerning the respect for private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC) came into effect in the UK. It’s implement as part of the equally wordy Privacy and Electronic Communications Regulations 2012 (PECR) One thing it did was require companies with websites to allow users to opt in to cookies. I’ve written about this before, but since then the amount of tracking cookies has become insane, and most people would choose to opt out of that much survailance.

The problem is that with so many cookies, some websites have effectively circumvent the law by making it impractical to opt out of all of them. At first glance they offer an easy way to turn everything off and “save settings”, but what’s not so clear is that they hide an individual option for every tracking cookie company they have a deal with. And there can be hundreds, each of which needs to be individually switched off using a mouse.

These extra cookies – and they’re the ones you don’t want – are usually hidden behind a “vendors” or “partners” tab. With the site shown below this was only found by scrolling all the way down and clicking on a link.

This kind of thing is not in the spirit of the act, and web sites that do this do not “care about your privacy” in any way, shape or form. And if you think these opt-in/out forms look similar, it’s because they are. Consent Management Platforms like Didomi, OneTrust and Quantcast are widely used to either set, or obfuscate what you’re agreeing to.

An update to the ePrivicy directive is now being talked about that says it must be “easy” to reject these tracking cookies, which isn’t the case now.

Meanwhile some governments are cracking down. In January, Google and Facebook both got slapped with huge fines in France from the
Commission Nationale de l’Informatique et des Libertés, which reckoned that because it took more clicks to reject cookies than accept them, Google and Facebook were not playing fair.

“Several clicks are required to refuse all cookies, against a single one to accept them. [The committee] considered that this process affects the freedom of consent: since, on the internet, the user expects to be able to quickly consult a website, the fact that they cannot refuse the cookies as easily as they can accept them influences their choice in favor of consent. This constitutes an infringement of Article 82 of the French Data Protection Act.”

I’m inclined to agree. And on top of a fines of €150 and €60 respectively, they’re being hit with €100K for each extra day they allow the situation to remain.

Unfortunately we’re not likely to see this in the UK. The EU can’t actually agree on the final form of the ePrivacy regulations. The UK, of course, is no longer in the EU and may be able to pass its own laws while the EU argues.

Last year the Information Commissioner’s office did start on this, taking proposals to the G7 in September 2021. Elizabeth Denham told the meeting that a popup with a button saying “I Agree” wasn’t exactly informed consent.

However, this year the government is going in the other direction. It’s just published plans to do away with the “cookie popup” and allow websites to slurp your data. Instead sites will be required to give users clear information on how to opt out. This doesn’t fill me with confidence.

Also in the proposals is to scrap the need for a Data Protection Impact Assessment (DPIA) before slurping data, replacing it with a “risk-based privacy management programme to mitigate the potential risk of protected characteristics not being identified”.

I don’t like the idea of any of this. However, there’s a better solution – simply use a web browser that rejects these tracking cookies in the first place.

Digital Postage Stamps

The Royal Mail hasn’t just lost your item, it’s lost the plot completely.

While the news media has been obsessed with what civil servants might have been doing after work in Downing Street they have overlooked the latest bonkers development from the Post Office – “digital stamps”.

The gimmick is that every new stamp will have a 2D barcode on one side. According to the Royal Mail’s Nick Landon, “Introducing unique barcodes on our postage stamps allows us to connect the physical letter with the digital world and opens up the possibilities for a range of new innovative services in future.” This was followed by promises that it would be possible to link the codes to videos, and by scanning them with an App you could send “birthday messages” and other videos.

Just because something’s possible, it doesn’t mean its a good idea Nick! But what’s the harm, eh?

Well look a bit further – from the start of 2023 you won’t be able to use any of your existing stamps. That’s right – they’re being withdrawn. In a statement Royal Mail has said:

“Mail posted with non-barcoded Definitive stamps after 31 January 2023, will be treated in the same way as if there is insufficient postage on an item….Any item that has insufficient postage is subject to a surcharge. Surcharge fees can be found on our website.”

What you’re supposed to do now is find all your “old fashioned” stamps and post them off to the Royal Mail, who will send you the new digital ones in return. What a waste of time and money – theirs and ours. Why not just accept the old stamps people have paid for until they run out? I’ve asked but received no further comment.

So let’s just assume the Royal Mail hasn’t completely lost its senses and there’s a better reason for this than using an App to “send” Shaun the Sheep videos, or to make money by cancelling stamps already paid for that people won’t get around to replacing in time.

One answer would be to make the stamps machine readable. Possibly, but that’d also make them much easier to forge. You could machine-read an existing stamp anyway; barcode technology is quicker and more forgiving, which is also its weakness.

Perhaps they’re worried about counterfeit stamps? Printing a barcode isn’t difficult. Unless…

I’ve looked at the stamps and they’ve got what’s probably a 47×16 matrix. Allowing for ECC and alignment marks that’s still going to be something like a 480-bit number – enough to give every stamp printed its own serial number from now until the end of time. This would also explain how scanning one could be used to deliver a unique video message to the recipient. If this is the plan – every stamp is unique – they could spot when the same stamp passed through their scanners twice, thus spotting when a forgery has been used.

The flaw in this brilliant plan is that the Royal Mail will have no way of telling if the stamp its currently scanning is the original or the forgery. If a forger has used your stamp number before you did, I predict an almighty row.

Reply-To: gmail spam and Spamassassin

Over the last few months I’ve noticed huge increase is spam with a “Reply To:” field set to a gmail address. What the miscreants are doing is hijacking a legitimate mail server (usually a Microsoft one) and pumping out spam advertising a service of some kind. These missives only work if the mark is able to reply, and as even a Microsoft server will be locked down sooner or later, so they’ll never get the reply.

The reason for sending this way is, of course, spam from a legitimate mail server isn’t going to be blacklisted or blocked. SPF and other flags will be good. So these spams are likely to land in inboxes, and a few marks will reply based on the law of numbers.

To get the reply they’re using the email “Reply-To:” field, which will direct the reply to an alternative address – one which Google is happy to supply them for nothing.

The obvious way of detecting this would be to examine the Reply-To: field, and if it’s gmail whereas the original sender isn’t, flag it as highly suspect.

I was about to write a Spamassassin rule to do just this, when I discovered there is one already – and it’s always been there. The original idea came from Henrik Krohns in 2009, but it’s time has now definitely arrived. However, in a default install, it’s not enabled – and for a good reason (see later). The rule you want is FREEMAIL_FORGED_REPLYTO, and it’s found in 20_freemail.cf

Enabling FREEMAIL_FORGED_REPLYTO in Spamassassin

If you check 20_freemail.cf you’ll see the rules require Mail::SpamAssassin::Plugin::FreeMail, The FreeMail.pm plugin is part of the standard install, but it’s very likely disabled. To enable this (or any other plugin) edit the init.pre file in /usr/local/etc/mail/spamassassin/ Just add the following to the end of the file:

# Freemail checks
#
loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm

You’ll then need to add a list of what you consider to be freemail accounts in your local.cf (/usr/local/etc/mail/spamassassin/local.cf). As an example:

freemail_domains aol.* gmail.* gmail.*.* outlook.com hotmail.* hotmail.*.*

Note the use of ‘*’ as a wildcard. ‘?’ matches a single character, but neither match a ‘.’. It’s not a regex! There’s also a local.cf setting “freemail_whitelist”, and other things documented in FreeMail.pm.

Then restart spamd (FreeBSD: service spamd restart) and you’re away. Except…

The problem with this Rule

If you look at 20_freemail.cf you’ll see the weighting is very low (currently 0.1). If this is such a good rule, why so little? The fact is that there’s a lot of spam appearing in this form, and it’s the best heuristic for detecting it, but it’s also going to lead to false positives in some cases.

Consider those silly “contact forms” beloved by PHP Web Developers. They send an email from a web server but with a “faked” reply address to the person filling in the form. This becomes indistinguishable from the heuristic used to spot the spammers.

If you know this is going to happen you can, of course add an exception. You can even have the web site use a local submission port and send it to a local mailbox without filtering. But in a commercial hosting environment this gets a bit complicated – you don’t know what Web Developers are doing. (How could you? They often don’t).

If you have control over your users, it’s probably safe to up the weighting. I’d say 3.0 is a good starting point. But it may be safer to leave it at 0.1 and examine the results for what would have been false positives.

It’s the LAW (GDPR as an excuse)

GDPR

In the 2000s it was “It’s necessary for our QA procedure”. Now it’s GDPR. Basically, the technical sounding response to shut people up when they complain. As a qualified ISO-9000 auditor I used to had a lot of fun calling their bluff in the first case.

With data protection it might seem more clear cut than having an encyclopaedic knowledge of ISO9000:2000. After all DPA 2018 (that which implemented GDPR) isn’t that dissimilar to its predecessors, and has a much tighter scope. However, it’s more open for interpretation and we’re waiting for some test cases.

However, what it doesn’t cover are situations like this:

Dear Mr Leonhardt, 
Hope you're well; It is law to speak to the account holder.
Kind regards, Salvin Tingh
Morrisons Online Customer Service Team

I won’t bore you with the full details of what led to this attempted put-down, but briefly I emailed Morrisons about a mistake they’d made on an order. On receiving no response I called (and they sorted it out efficiently, over the phone). A week later I got an email response, and I said it was too late but it was sorted out, thanks very much. A week later, another reply that suggested they hadn’t read the first one. I said “Sorted, thanks, and I’ll just use the ‘phone in future”.

Next week’s reply was along the lines that they couldn’t verify I was the customer. I replied that perhaps they should have tried (they know my email address and telephone number), but don’t worry it’s sorted. A week later the above arrived (name changed to protect the guilty).

Leaving aside the principles of good customer service – if you need to check someone’s identity before solving a problem then do so – one might wonder what law he might be talking about. You see, data protection laws are not as wide-ranging as people think.

Basically, the law relates to sensitive information about an identifiable individual. Stronger protections exist depending on the sensitivity of the information (e.g. race, religion, biometrics and the usual stuff). But if it’s not sensitive information about an identifiable information it’s definitely out-of-scope.

In this case, Mr Tingh was dealing with a customer’s problem. He wasn’t being asked to divulge sensitive information to a possible third party. It’s possible (and desirable) that company procedures required that he make sure it really was the customer complaining, but that’s hardly “the law”. And had I been an imposter claiming I hadn’t received my sausage, the worst that would happen was someone else got a couple of quid refunded unexpectedly. Does Morrisons get that kind of thing often, one wonders?

And it also begs the question, if they were so concerned about whether a customer complaint about an order, emailed in with the full paperwork, really was from the household in question they need only pick up the phone; or check the email address? Neither of these is fool proof, but in the circumstances one might have thought this good enough. Did he want me to visit the shop show the manager my passport?

But to reiterate, The Data Protection Act (colloquially referred to as GDPR) is there to protect information pertaining to an individual. A company would have a duty to ensure it was talking to the right person if giving out sensitive information, but when someone is reporting the non-delivery of a vegan sausage to the suppler there is no sensitive information involved. They only need to check your identity if its really necessary.

Other protections in the DPA include transparent use of an individual’s data, not storing more than is necessary or for longer than necessary, and ensuring it’s accessible to the individual concerned, not leaked and is accurate (corrected if needs be). The European GDRP added provisions for portability, forcing companies to make your data available to competing services at your request.

So when someone tries to fob you off with “data protection”, stop and think if the above actually apply. And if you’re trying to fob someone off, don’t try to bluff a data security expert.