Bad software and security

Today, LinkedIn decided I might want to see this post from Thomas Barnett:

“Most businesses don’t get hacked because of bad software.
They get hacked because someone clicked.”

(Apologies if the link doesn’t work – it’s a Microsoft site).

He is, of course, quite correct in that phishing and trojans are the most exploitable vulnerability in most organisations, but it hinges on the term “bad software”. If you’re new to the world of computing you won’t remember a time when this wasn’t a problem, but it has become one largely thanks to Microsoft thinking of profits before security, with features to “create a richer user experience”. I’d classify this as “bad software”, and it very much is the cause.

In the early days of the Internet there was assumed security, as any miscreants could be apprehended by the system operator checking which terminal they were on and paying them a visit. Unencrypted data flew back and forth on the network without it being a huge risk, as access to the wires was controlled by bricks and mortar. It took a while to add encryption when the Internet went public, but that’s done. Logins require security certificates. SMTP relays are closed. It should all be good.

Then some fools decided it would be “cool” to embed software in network traffic.

“Let’s allow people to send executable files by email that look like any other file and can be opened by clicking on them.” Bad software.

“Let’s embed JavaScript in web pages so we can run stuff on the user’s machine.” Bad software.

“Let’s embed software in Microsoft Office documents.” Bad Software.

“Let’s use passwords for accessing important data instead of security certificates tied to a host.” Bad Software.

There are other forms of idiocy around, such as downloading software from a package repo, placed there by anyone on the Internet, simply because there are so few actual software engineers around who can configure a server without a Docker image. But using passwords to log into remote systems, encrypted or otherwise, where the user has no way of knowing whether it’s the real login page is nuts. So is embedding software in electronic communications.

A cybersecurity industry has built up trying to place mitigations on this bad software. People like me have been warning about the fundamental dangers of Microsoft’s “Rich user experience” mantra for decades. I remember writing a piece for PCW in November 1996 when Netscape (not Microsoft, for once) proposed adding JavaScript to web browsers. (Previously Java Applets* were theoretically sandboxed).

Before this, when Microsoft added WordBasic in Word for Windows and DOS, people like me who’d been on the forefront of antimalware in the 1980s, were scarcastingly asking “What could possibly go wrong?”

So Mr Barnett is right to say these things are the most effective attack vector. Organisations should be very afraid. But they’re only attack vectors because the software is bad.

*JavaScript and Java share the name but are nothing like the same thing.

The real reason AWS failed

Amazon is about to shed 14,000 jobs to “stay nimble”. Would it be rude to tell the Senior Vice President of People Experience and Technology at Amazon that this bird has already flown? Engineers trying to use AWS will tell you that they days of Amazon’s amazing technical prowess are well behind them. And it’s across the board – from their smart devices (has anyone used the new Alexa App) to AWS services that only half work. Only their retail system remains best-in-class.

Amazon blamed the recent DNS outage that took so many of their customers offline a week ago on a “race condition” where the DynamoDB back-end to their unified DNS solution simply failed. They explained it in great detail here:

https://aws.amazon.com/message/101925/

What they didn’t say why it failed; why the race condition existed. Why they screwed up. I’ll hazard a guess that they made the people who really knew how their systems worked redundant by mistake, replacing them with new and cheaper hires to fill the hole in a spreadsheet. There was no one left to spot the flaw until it became manifest. Engineers are interchangable, right? And you just need the right number to fit the project plan.

You don’t need large teams of qualified people to make this stuff work. You need small teams of experienced people who stick with the job and are treated enough respect that they’re empowered to do it. The good ones are not going to stick around to play redundancy roulette every six months, hoping that HR actually understand they’re necessary to keep the show on the road. HR take pride in saying they’re “people people”. Good engineers are not people people; they’re more likely to be neurodiverse. Their managers are unlikely to understand what they do, and HR certainly won’t.

I dare say there is a lot of dead wood in Amazon. They were recruiting like crazy during the pandemic; anyone who looked vaguely like an engineer with a pulse. The trick is identifying who to keep; if, indeed, there is anyone left who they haven’t made redundant already, or who simply got too spooked and left.

Malware turns your mouse into spy microphone

Researchers at the University of California in Irvine have discovered that high resolution gaming mice can act as a microphone with a high enough sample rate, using special software installed on the PC. Vibrations in the air, transferred to the mouse mat can be picked up by the sensors, filtered and produce recognisable speech in the right circumstances.

Users of normal mice have little to worry about, but as they point out, anti malware vendors don’t currently treat mouse input as an attack vector.

Read all about it here:

Invisible Ears at Your Fingertips: Acoustic Eavesdropping via Mouse Sensors

https://arxiv.org/html/2509.13581v1

Unix/BSD users are not in danger, as the attack won’t work with a keyboard.

Airports “hacked” by ransomware gang

I’m looking at media reporting of the disruption caused to airports by the latest ransomware attack and I’m once again struct by the lack of detail. The victims are, as always, tight-lipped about it and this translates to the media as “we don’t know what happened apart from it was an attack”.

Anyone who knows how this stuff works will have a pretty good idea what went down. So let’s look at the Collins Aerospace system at the heart of it: It’s reported as being MUSE but it’s actually cMUSE

cMUSE stands for common-use Multi-User System Environment, and it allows airlines to share check-in desks. It’s what’s known as a common-use passenger processing system, or CUPPS. When the self-loading cargo presents itself a the check-in it tracks their bags using integration with systems like BagLink, sorts out boarding stuff and so on. It’s main competitor, if you look at it that way, is SITA’s BagManager, but this only handles and tracks luggage.

Now here’s the thing – cMUSE makes a big thing of being cloud based. It runs on AWS. A SaaS product. It is possible to run it on your own infrastructure, but they sell the benefits of not needing your own servers and expensive IT people to manage it – just let them do it for everyone on AWS.

So what went wrong? They haven’t said, but a penny to a pound it’s the AWS version that got hit. This is why so many airlines got their check-in hijacked in one go. A nice juicy target for the ransomware gangs.
At Heathrow, I believe it’s deployed on over 1,500 terminals on behalf of more than 80 airlines. It’s used in over 100 airports worldwide, which isn’t a huge share of the total number (there are over 2000 big ones according to the ACI), but it’s been sold extensively to the big european ones – high-traffic multi-carrier hubs. The ones that matter. Heathrow renewed for another six-year contract this April.

Collins claims it will save $100K per airport going to AWS, but that must seem like a false economy right now. Its predecessor, vMUSE, dates before cloud-mania and users of the legacy system must be feeling quite smug. Many airports have a hybrid of cMUSE and vMUSE and it’s hard to know the mix.

Ottawa International went cloud with a fanfare in 2017, and Shannon Airport chugged down the kool-aid, renewing for cloud-only in 2025. Heathrow is likely mostly cloud. Cincinnati/Northern Kentucky, Indira Gandhi International (Delhi) are publicly know to be cloud users. What bet Brussel and Berlin Brandenburg are on the list? Lesser problems at Dublin and Cork, which use the system, suggest they’re hybrid or still on vMUSE.

Subscribing to a cloud service for anything important is such a bad idea. You’re only as safe as your cloud provider. There’s no such thing as a virtual air-gap and large-scale attacks are only possible because everyone’s using the same service. If airports save $100K by switching, they’d be much better off having servers on-site and paying someone to look after them – part-time if it’s such a small amount in question.

If you want a games server in the cloud go ahead. If my business depended on it, I’d want to know where my data was and who could get at it.

FCA and Government want to scrap 2FA on contactless payments

You couldn’t make this up. The government has ordered the Financial Conduct Authority to come up with bright ideas to stimulate the economy (after other government policy has sent it in the opposite direction). Their latest wheeze, published today, is to make contactless card payments unlimited in value. I kid you not.
https://www.fca.org.uk/news/press-releases/proposed-contactless-changes-could-increase-convenience-consumers

This means that if someone steals a contactless card they can spend as much as they like from your bank account. But don’t worry, the FCA says the banks will have to refund you if it happens.

Part of their justification is that digital wallets (Apple Pay and Google Pay) allow for much higher contactless transactions than the current £100 limit. For anything over £100 and the card system asks for a PIN to prove it’s really you. Even with that safeguard, criminals make a series of transactions of around £90 before the banks fraud system detects something suspicious.

You might remember that the contactless limit was £30 from 2015 to pandemic, after which it was raised to £45 and then £100 in 2021 to reduce the amount of contaminated cash in circulation. It was never reduced, which some say was a mistake.

The difference between physical cards and Apple Pay/Google Wallet is that they require you to unlock the phone first, which is arguably more secure than a four-digit PIN. Claiming that because these are unlimited that the PIN security should be stripped from physical cards is the craziest thing I’ve heard in years. And the FCA is going out of its way to blame the government.

LetsEncrypt, acme.sh and Apachectl reloads

This morning I woke up to an expired TLS certificate on this blog. This is odd, as it’s automatically renewed from LetsEncrypt using acme.sh, kicked off by a cron job. So what went wrong?

I don’t write about LetsEncrypt or ACME much as I don’t understand everything about it, and it keeps surprising me. But I had discovered a problem with FreeBSD running the latest Apache 2.4 in a jail. As I run my web servers in jails, this applies to me.

I like acme.sh. It’s a shell script. Very clever. No dependencies. Dependencies are against my religion. Why anyone would use a more complex system when there’s something simple that works?

For convenience reasons the certificates are renewed outside of a jail, and the sites are created using a script that sets it all up for me. One source of certificates for multiple jails; it’s easier to manage. It manages sites on other hosts using a simple NFS mount.

When you use acme.sh to renew a certificate for Apache you need to be able to plonk something on the web site. This is easy enough – the certificate host (above the jails) can either get direct access through the filing system, or via NFS. It then gets the new certificate and copies it into the right place. When you first issue yourself a certificate you specify the path you want the certificate to go, and the path to the web site. You also specify the command needed to get your web server to reload. It magically remembers this stuff so the cron job just goes along and does them all. But that’s where the fun starts.

I rehosted the blog on a new instance of Apache, and created a new temporary website to make sure SSL worked – getting acme.sh to issue it a certificate in the process. All good, except I noticed that inside a jail, the new version of Apache stops but doesn’t restart after an “apachectl graceful”. The same with “apachectl reload”. Not great, but I tried using “service -j whatever apache24 restart”. A bit drastic but it worked, and I’ve yet to figure out why other methods like “jexec whatever apachectl graceful” stall.

So what happened this morning at 6am? There were some certificates to renew and acme.sh –cron accidentally KOed Apache. It’s the first time any had expired.

Running acme.sh manually between restarting Apache manually worked, but it’s hardly the dream of automation promised by Unix. Debugging the script I found it was issuing a graceful restart command, and I thought I’d specified something more emphatic. So I started grepping for the line in was using, assuming it must be in a config file somewhere. Nothing.

Long story short, I eventually found where it had hidden the command: in .acme.sh/domain.name/domain.name.conf , in spite of having looked there already. It turns out that it’s the line “Le_ReloadCmd=”, and its unique for each domain (sensible idea), but it’s base64 encoded instead of being plain text! And it’s wrapped between “_ACME_BASE64__START_” and “_ACME_BASE64__END_”. I assume this is done to avoid difficulties with certain characters in shell scripts but it’s a bit of a pain to edit it. You can create a new command by piping it through base64 and editing very carefully, but readable it ain’t.

There is an another way – just recopy the certificate. Unfortunately you need to know, and use, the same options as when you originally created it – you can’t just issue a different –reloadcmd. You can check these by looking at the domain.name.conf file, where fortunately these are stored in plain text. Assuming they’re all the same, this little script will do them all for you at once. Adjust as required.

#!/bin/sh

# Make sure you're in the right directory
cd ~/.acme.sh

# Jail containing web site, assumed all the same.
WJAIL=web

for DOM in $(find . -type d -depth 1 | sed "s|^\./||")
do

echo acme.sh $TEST -d $DOM  --install-cert \
        --cert-file /jail/$WJAIL/data/certs/$DOM/cert.pem \
        --key-file /jail/$WJAIL/data/certs/$DOM/cert.key \
        --fullchain-file /jail/$WJAIL/data/certs/$DOM/Fullchain.pem \
        --reloadcmd "service -j $WJAIL apache24 restart"
done

You will notice that this only echos the command needed, so if anyone’s crazy enough to copy/paste it then it won’t do any damage. Remove the “echo” when you’re satisfied it’s doing the right thing for you.

Or you could just edit all the conf files and replace the Le_ReloadCmd= line – you only have to generate it once, after all.

FreeBSD/Linux as Fibre Broadband router

British Telecom, bless them, has decided that copper telephone lines have to go and is forcing everyone onto fibre Internet and VoIP. Except rural customers currently connected to the Internet using a wet piece of string if they’re lucky, of course.

Incidentally, “Fibre Broadband” is a nonsense in a technical sense but the battle is lost – the public believes Broadband is any Internet connection to the home that isn’t dial-up.

Although I’ve written about routing on FreeBSD before, I thought it was time for an update. Why route on FreeBSD? Because unlike the cheap and nasty “routers” supplied by domestic (and some commercial) ISPs, it doesn’t crash. You don’t have to turn it off and on again. And it does what it’s told, with great diagnostics. You can also run plenty of other services on the same box if it’s powerful enough, or your throughput is modest.

Most of this should work fine on Linux, although the networking is generally considered less efficient than the real thing. However, at less than 1Gbps on a single line this isn’t going to matter, if it matters at all. With Linux you get less of the nuts and bolts built in to the base system so you may have to install extra packages depending on which distribution you are using. But this is all standard stuff so shouldn’t be too difficult. It’s the settings that matter, and probably the reason you’re reading this!

In this first article I’ll just consider a gateway router with NAT, and leave DNS, DHCP and other options until later.

Setting up PPPoE using user-ppp

First off, your WAN connection. With FTTC and FTTP this is normally a little white box – either a VDSL modem or an ONT. It connects to the phone line or fibre cable on one end, and has an RJ45 on the other that looks like Ethernet, because it is Ethernet. I’m going to call them Ethernet Modems, as they’re treated the same for our purpose. However, being Ethernet won’t do you much good as it’s just talking a protocol called PPPoE – or Point-to-Point Protocol over Ethernet.

PPP is an old protocol for making an Internet connection using dial-up, but it’s evolved (or suffered mission creep) and it’s now rather complicated thanks to all the baggage. Fortunately you can ignore the baggage and concentrate on the PPPoE stuff, once you know which is which. And that’s always the trick.

You’ll need a host (i.e. computer) with two Ethernet ports unless you want a complicated life. If you’re using an old PC with just one you can get away with a USB3 Ethernet adapter, but having a couple of server-grade NICs on the motherboard or add-on cards is the best way to go. Very generally, Intel or Broadcom are good choices, Realtek is at the low end.

You need to connect your Ethernet Modem to one port on your host and the other port goes to the LAN.

If you Ethernet Modem and the host you’re planning to use as a router are in different places you can connect them using a VLAN. It’s proper Ethernet and can be switched. Without a VLAN it’s not so simple, so plug it in using a direct cable.

PPP is built in (to FreeBSD etc) in the base system. Type ppp (as root) and it’ll start up in interactive mode. If it doesn’t, you’re not using BSD and therefore lack a base system and will have to install it as a package. You might like to start here: https://tldp.org/HOWTO/PPP-HOWTO/

Although you can compile PPP support into the kernel, the ppp we’re talking about is a program written by Toshiharu OHNO and Brian SOMERS in the early 1990s, and part of BSD since FreeBSD and OpenBSD 2. It’s the normal straightforward way of doing things.

ppp has a simple config file in /etc/ppp/ppp.conf. It can contain profiles for multiple services in sections, with the service name being arbitrary, and ending in a colon (“:”). You specify the service when you run it, and stuff in other sections is ignored. This is a hangover from the days when people had multiple dial-up connections.

Here’s a service definition for Cloudscape, one of my favourite ISPs, but other UK FTTP services will be similar or identical. UK FTTC and SoGEA modems are pretty much the same too.

cloudscape:
  delete default                # May already have a
                                # default route configured elsewhere
  set device PPPoE:bge1
  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP
  set dial
  set login
  set lcp
  set mru 1492
  set mtu 1492
  nat enable no               # Turn of NAT explicitly!
  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #   (detect dropped line)
  enable echo                 # Enable echo for LQR
  iface name wan0
  add default HISADDR

The ppp program was originally used for serial PPP connections to dial-up ISPs or organisations, but here we’re just using it for PPPoE. In support of switching ISPs it can add stuff to config files like resolv.conf and the routing table, which in the old days tended to be dynamic.

Feel free to read the manual that explains what the options above do, but briefly I’m starting by deleting the default route, which probably won’t exist unless you’ve configured it (possibly using DHCP), but if it does will cause problems when ppp adds another.

  set device PPPoE:bge1

This says we’re using PPPoE over the bge1 Ethernet card. Obviously set this to the Ethernet card to which your Ethernet Modem (e.g. ONT) is attached.

  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP

This is the user-name and password supplied by your ISP. These tend to be low security, but are needed for the protocol for historic reasons.

  set dial
  set login
  set lcp

This will cause ppp to dial, log in and get details using LCP. Some people will try to tell you that internet lines are configured with DHCP – that’s for LANs. LCP (Link Control Protocol) provides the same function, such as what your IP address is and which DNS servers to use, over a point-to-point connection.

  set mru 1492
  set mtu 1492

There are eight bytes of protocol data added to every standard 1500 byte Ethernet frame so won’t fit 1:1 with a PPPoE packet. Reducing the MTU to 1492 gets around this and avoids fragmentation, which is a good thing. LCP might suggest or force a lower MTU but there’s no harm in specifying it.

    nat enable no

If you’re running on a single IP address, or you don’t mind all of your source addresses being the same IP, you can leave this line off. It shouldn’t be necessary anyway. According to the manual you need to enable ppp to do NAT if you want it. But some bright spark has added the “-nat” option to the FreeBSD service management scripts so you get it by default, whether you asked for it or not. This has the effect of making every source address leaving the tunnel map to the first address configured, leading to some interesting anomalies when you make an outgoing connection. It doesn’t affect incoming address mapping, probably because Brian Somers didn’t envisage a whole subnet using it back in 1998.

  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #  (detect dropped line)
  enable echo                 # Enable echo for LQR

This disables IPv6 and enables IPv4 (which is on by default anyway). If you want to use IPv6 your service provider needs to support it, and most don’t.

LQR is probably not going to be necessary for our purposes and generates warnings, so I’ve left the line in but commented it out for now. The enable echo therefore has no effect.

  iface name wan0

By default, ppp will name its connections as tun0, tun1 and so on (tun being Tunnel). This means that you never know what the interface is going to be called, as other tunnels may exist before you start this one. We’re going to be referring to the interface in the PF firewall, so it helps to be sure what its name will be. The line above sets the name manually, and I’ve called in wan0, which is logical. You may, of course, have multiple WAN connections including dial-up backups, so giving them a sensible name is, er, sensible. You can call it anything you like if you’re nuts.

  add default HISADDR

This is an example of ppp messing with your system configuration – in this case it’s taking the IP address supplied by LCP, represented by the macro HISADDR, and adding it as the default route. If you have a static IP address you might want to set it statically in the normal way.

Likewise, if you add the line “enable dns” it will take the DNS servers offered by LCP and add them to resolv.conf. It won’t remove them, and may well end up messing up whatever local DNS arrangements you have, so I prefer to do this manually.

Once you’ve edited ppp.conf you can test it out interactively with “ppp cloudscape” and see what happens. Type “dial” and it should make the connection, and wan0 should appear in your list of network interfaces. Use netstat -r to see if the new default route has appeared.

Setting up the pf firewall

ppp-user is a large program that tries to do everything, including NAT and being a firewall. This isn’t very UNIX-like in philosophy, but you can use these facilities if you like. I prefer to have a dedicated standard firewall, PF, and leave that to do everything firewall-like in one place.

If you’re setting up a router you’re probably going to need asymmetric NAT. Your /etc/pf.conf file will look something like this:

scrub in all
WAN=wan0
WANIP=1.2.3.4
nat pass on $WAN from 192.168.1.0/24 to any -> $WANIP
#rdr pass on $WAN proto tcp from any to $WANIP port 80 -> 192.168.1.123

The WAN IP comes from your ISP, although you will be able to see it using “ifconfig wan0:” if you don’t have it. I’m assuming your LAN is 192.168.1.0/24 – just set this to whatever you’re using. And that’s about it.

As a bonus, the commented out example line at the end would external port 80 to a web server on LAN address 192.168.1.123 – an open port. Peter Hansteen has written an excellent book on PF, called “The Book of PF”, which will tell you everything you need to know, and it’s well documented in various online handbooks and man pages, unlike ppp-user’s built in firewall.

The only reason for using user-ppp for NAT is if you’re on a dynamic IP address, in which case and “enable nat” and add ppp_nat=yes to /etc/rc.conf

Kicking it all off

First you need to enable routing:

 sysctl net.inet.ip.forwarding=1

This will work until reboot, and you can turn it off again by setting it to zero if something bad happens, like your NIC catching fire. Then dial your ISP (Cloudscape in this example)

ppp -ddial cloudscape

You should now have a connection to the Internet on the BSD box. Now enable PF for NAT.

service pf start (or onestart)

Of it it’s running, use “service pf reload” to load the new config. At this point every machine on the LAN should be able to use your LAN IP address as a gateway.

When you’re happy it works, to make this kick off automatically on boot, modify /etc/rc.conf:

sysrc ppp_enable=yes
sysrc ppp_mode=ddial
sysrc ppp_profile="cloudscape"
sysrc pf_enable=yes
sysrc gateway_enable=yes

Optionally “sysrc ppp_nat=yes” if you’re not using PF for NAT. Or if you’re editing rc.conf directly:

pf_enable=yes
gateway_enable=yes

ppp_enable="YES"
ppp_mode="ddial"
#ppp_nat="YES"	# We let PF do NAT
ppp_profile="name_of_service_provider"

I will do a part two to this post explaining how to configure DNS and DHCP, although there’s no reason these need to be on the same host you’re using as a router. In fact it’s good practice to separate them and have more than one DHCP and DNS server if you have the resources.

I hope you found it useful – any questions add a comment below.

(Updated 06-Oct-2025 when I discovered the FreeBSD service scripts added -nat to the ppp startup without telling anyone).

How to tell if a host is up without ping

Some people seem to think that disabling network pings (ICMP echo requests to be exact) is a great security enhancement. If attackers can’t ping something they won’t know it’s there. It’s called Security through Obscurity and only a fool would live in this paradise.

But supposing you have something on your network that disables pings and you, as the administrator, want to know if it’s up? My favourite method is to send an ARP packet to the IP address in question, and you’ll get a response.

ARP is how you translate an IP address into a MAC address to get the Ethernet packet to the right host. If you want to send an Ethernet packet to 1.2.3.4 you put out an ARP request “Hi, if you’re 1.2.3.4 please send your MAC address to my MAC address”. If a device doesn’t respond to this then it can’t be on an Ethernet network with an IP address at all.

You can quickly write a program to do this in ‘C’, but you can also do it using a shell script, and here’s a proof of concept.

#!/bin/sh
! test -n "$1" && echo $0: Missing hostname/IP && exit
#arp -d $1  >/dev/null 2>/dev/null
ping -t 1 -c 1 -q $1 >/dev/null
arp $1 | grep -q "expires in" && echo $1 is up. && exit
echo $1 is down.

You run this with a single argument (hostname or IP address) and it will print out whether it is down or up.

The first line is simply the shell needed to run the script.

Line 2 bails out if you forget to add an argument.

Line 3, which is commented out, deletes the host from the ARP cache if it’s already there. This probably isn’t necessary in reality, and you need to be root user to do it. IP address mappings are typically deleted after 20 minutes, but as we’re about to initiate a connection in line 4 it’ll be refreshed anyway.

Line 4 sends a ping to the host. We don’t care if it replies. The timeout is set to the minimum 1 second, which means there’s a one second delay if it doesn’t reply. Other ways of tricking the host into replying exist, but every system has ping, so ping it is here.

Live 5 will print <hostname> is up if there is a valid ARP cache entry, which can be determined by the presence of “expires in” in the output. Adjust as necessary.

The last line, if still running, prints <hostname> is down. Obviously.

This only works across Ethernet – you can’t get an ARP resolution on a different network (i.e. once the traffic has got through a router). But if you’re on your organisation’s LAN and looking to see if an IoT devices is offline, lost or stolen then this is a quick way to poll it and check.

Why can’t I ping my Amazon Echo?

The simple answer is that the current Amazon Echo devices don’t respond to a ping – or technically an ICMP echo request. There’s a lot of waffle on the web saying this is because they’re too simple to do it, but this isn’t the case. The original Echo (at least before software updates) and the Echo Show 8” most certainly did respond to a ping, but the functionality has been dropped since then. Some people naively think that it’s a security risk, part of a doctrine known as Security Through Obscurity. As it’s easy enough to find an Echo without a ping, it’s only a slight inconvenience to a would-be attacker and a big inconvenience to an network administrator.

Most later Echos do have open ports, however, so you can check to see if it’s alive because the port will be there. I emphasise “open”, as Echos use quite a lot of ports that aren’t always open, for things like setup or communicating out. But these ports are open and can be connected to – even if the connection is refused it shows there’s something there to refuse it.

Based on my incomplete collection of Echo devices, they have the following characteristics:

ModelPing?Ports
Original Echo
Echo Dot fourth Generation1080, 6543, 8888
Echo Flex1080, 8888
Echo Dot Second Generation1080, 8888
Echo Dot Third Generation1080, 8888
Echo Show 8-inch (second generation)Y8009
Echo Spot first Generation
Echo Show 5-inch

So how can you reliably tell if your Amazon Echo device is alive on the network? Rather than messing around with ports, my favorite way is to send it an ethernet ARP request and see if you get a reply. I did say disabling ping was a fools solution to security.

See here for how to do this.

Using ddrescue to recover data from a USB flash drive

If you’re in the data recovery, forensics or just storage maintenance business (including as an amateur) you probably already know about ddrescue. Released about twenty years ago by Antonio Diaz Diaz, it was a big improvement over the original concept dd_rescue from Kurt Garloff in 1999. They copy disk images (which are just files in Unix) trying to get as much data extracted when the drive itself has faults.

If you’re using Windows rather than Unix/Linux then you probably want to get someone else to recover your data. This article assumes FreeBSD.

The advantage of using either of these over dd or cp is that they expect to find bad blocks in a device and can retry or skip over them. File copy utilities like dd ignore errors and continue, and cp will just stop. ddrescue is particularly good at retrying failed blocks, and reducing the block size to recover every last readable scrap – and it treats mechanical drives that are on their last legs as gently as possible.

If you’re new to it, the manual for ddrescue can be found here. https://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html

However, for most use cases the command is simple. Assuming the device you want to copy is /dev/da1 and you’re calling it thumbdrive the command would be:

ddrescue /dev/da1
thumbdrive.img thumbdrive.map

The device data would be stored in thumbdrive.img, with ongoing state information stored in thumbdrive.map. This state information is important, as it allows ddrescue to pick up where it left off.

However, ddrescue was written before USB flash drives (pen drives, thumb drives or whatever). That’s not to say it doesn’t work, but they have a few foibles of their own. It’s still good enough that I haven’t modified ddrescue base code to cope, but by using a bit of a shell script to do the necessary.

USB flash drives seem to fail in a different way to Winchester disks. If a block of Flash EPROM can’t be read it’s going to produce a read error – fair enough. But they have complex management software running on them that attempts to make Flash EPROM look like a disk drive, and this isn’t always that great in failure mode. In fact I’ve found plenty of examples where they come across a fault and crash rather than returning an error, meaning you have to turn them off and on to get anything going again (i.e. unplug them and put them back in).

So it doesn’t matter how clever ddrescue is – if it hits a bad block and the USB drive controller crashes the it’s going to be waiting forever for a response and you’ll just have come reset everything manually and resume. One of the great features of ddrescue is that it can be stopped and restarted at any time, so continuing after this happens is “built in”.

In reality you’re going to end up unplugging your USB flash drive many times during recovery. But fortunately, it is possible to turn a USB device off and on again without unplugging it using software. Most USB hardware has software control over its power output, and it’s particularly easy on operating systems like FreeBSD to do this from within a shell script. But first you have to figure out what’s where in the device map – specifically which device represents your USB drive in /dev and which USB device it is on the system. Unfortunately I can’t find a way of determining it automatically, even on FreeBSD. Here’s how you do it manually; if you’re using a version of Linux it’ll be similar.

When you plug a USB storage device into the system it will appear as /dev/da0 for the first one; /dev/da1 for the second and so on. You can read/write to this device like a file. Normally you’d mount it so you can read the files stored on it, but for data recovery this isn’t necessary.

So how do you know which /dev/da## is your media? This easy way to tell is that it’ll appear on the console when you first plug it in. If you don’t have access to the console it’ll be in /var/log/messages. You’ll see something like this.

Jun 10 17:54:24 datarec kernel: umass0 on uhub5
kernel: umass0: <vendor 0x13fe USB DISK 3.0, class 0/0, rev 2.10/1.00, addr 2> on usbus1
kernel: umass0 on uhub5
kernel: umass0: on usbus1
kernel: umass0: SCSI over Bulk-Only; quirks = 0x8100
kernel: umass0:7:0: Attached to scbus7
kernel: da0 at umass-sim0 bus 0 scbus7 target 0 lun 0
< USB DISK 3.0 PMAP> Removable Direct Access SPC-4 SCSI device
kernel: da0: Serial Number 070B7126D1170F34
kernel: da0: 40.000MB/s transfers
kernel: da0: 59088MB (121012224 512 byte sectors)
kernel: da0: quirks=0x3
kernel: da0: Write Protected

So this is telling us that it’s da0 (i.e /dev/da0)

The hardware identification is “<vendor 0x13fe USB DISK 3.0, class 0/0, rev 2.10/1.00, addr 2> on usbus1” which means it’s on USB bus 1, address 2.

You can confirm this using the usbconfig utility with no arguments:

ugen5.1:  at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
...snip...
ugen1.1: at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen1.2: at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (300mA)

There it is again, last line.

usbconfig has lots of useful commands, but the ones we’re interested are power_off and power_on. No prizes for guessing what they do. However, unless you specify a target then it’ll switch off every USB device on the system – including your keyboard, probably.

There are two ways of specifying the target, but I’m using the -d method. We’re after device 1.2 so the target is -d 1.2

Try it and make sure you can turn your USB device off and on again. You’ll have to wait for it to come back online, of course.

There are ways of doing this on Linux by installing extra utilities such as hub-ctrl. You may also be able to do it by writing stuff to /sys/bus/usb/devices/usb#/power/level” – see the manual that came with your favourite Linux distro.

The next thing we need to do is provide an option for ddrescue so that it actually times out if the memory stick crashes. The default is to wait forever. The –timeout=25 or -T 25 option (depending on your optional taste) sees to that, making it exit if it hasn’t been able to read anything for 25 seconds. This isn’t entirely what we’re after, as a failed read would also indicate that the drive hadn’t crashed. Unfortunately there’s no such tweak for ddrescue, but failed reads tend to be quick so you’d expect a good read within a reasonable time anyway.

So as an example of putting it all into action, here’s a script for recovering a memory stick called duracell (because it’s made by Duracell) on USB bus 1 address 2.

#!/bin/sh
while ! ddrescue -T 25 -u /dev/da0 duracell.img duracell.map
do
echo ddrescue returned $?
usbconfig -d 1.2 power_off
sleep 5
usbconfig -d 1.2 power_on
sleep 15
echo Restarting
done

A few notes on the above. Firstly, ddrescue’s return code isn’t defined. However, it appears to do what one might expect so the above loop will drop out if it ever completes. I’ve set the timeout for time since last good read to 25 seconds, which seems about right. Turning off the power for 5 seconds and then waiting for 15 seconds for the system to recognise it may be a bit long – tune as required. I’m also using the -u option to tell ddrescue to only go forward through the drive as it’s easier to read the status when it’s always incrementing. Going backwards and forwards makes sense with mechanical drives, but not flash memory.

Aficionados of ddrescue might want to consider disabling scraping and/or trimming (probably trimming) but I’ve seen it recover data with both enabled. Data recovery is an art, so tweak away as you see fit – I wanted to keep this example simple.

Now this system isn’t prefect. I’m repurposing ddrescue, which does a fine job on mechanical drives, to recover data from a very different animal. I may well write a special version for USB Flash drives but this method does actually work quite well. Let me know how you get on.