FreeBSD/Linux as Fibre Broadband router

British Telecom, bless them, has decided that copper telephone lines have to go and is forcing everyone onto fibre Internet and VoIP. Except rural customers currently connected to the Internet using a wet piece of string if they’re lucky, of course.

Incidentally, “Fibre Broadband” is a nonsense in a technical sense but the battle is lost – the public believes Broadband is any Internet connection to the home that isn’t dial-up.

Although I’ve written about routing on FreeBSD before, I thought it was time for an update. Why route on FreeBSD? Because unlike the cheap and nasty “routers” supplied by domestic (and some commercial) ISPs, it doesn’t crash. You don’t have to turn it off and on again. And it does what it’s told, with great diagnostics. You can also run plenty of other services on the same box if it’s powerful enough, or your throughput is modest.

Most of this should work fine on Linux, although the networking is generally considered less efficient than the real thing. However, at less than 1Gbps on a single line this isn’t going to matter, if it matters at all. With Linux you get less of the nuts and bolts built in to the base system so you may have to install extra packages depending on which distribution you are using. But this is all standard stuff so shouldn’t be too difficult. It’s the settings that matter, and probably the reason you’re reading this!

In this first article I’ll just consider a gateway router with NAT, and leave DNS, DHCP and other options until later.

Setting up PPPoE using user-ppp

First off, your WAN connection. With FTTC and FTTP this is normally a little white box – either a VDSL modem or an ONT. It connects to the phone line or fibre cable on one end, and has an RJ45 on the other that looks like Ethernet, because it is Ethernet. I’m going to call them Ethernet Modems, as they’re treated the same for our purpose. However, being Ethernet won’t do you much good as it’s just talking a protocol called PPPoE – or Point-to-Point Protocol over Ethernet.

PPP is an old protocol for making an Internet connection using dial-up, but it’s evolved (or suffered mission creep) and it’s now rather complicated thanks to all the baggage. Fortunately you can ignore the baggage and concentrate on the PPPoE stuff, once you know which is which. And that’s always the trick.

You’ll need a host (i.e. computer) with two Ethernet ports unless you want a complicated life. If you’re using an old PC with just one you can get away with a USB3 Ethernet adapter, but having a couple of server-grade NICs on the motherboard or add-on cards is the best way to go. Very generally, Intel or Broadcom are good choices, Realtek is at the low end.

You need to connect your Ethernet Modem to one port on your host and the other port goes to the LAN.

If you Ethernet Modem and the host you’re planning to use as a router are in different places you can connect them using a VLAN. It’s proper Ethernet and can be switched. Without a VLAN it’s not so simple, so plug it in using a direct cable.

PPP is built in (to FreeBSD etc) in the base system. Type ppp (as root) and it’ll start up in interactive mode. If it doesn’t, you’re not using BSD and therefore lack a base system and will have to install it as a package. You might like to start here: https://tldp.org/HOWTO/PPP-HOWTO/

Although you can compile PPP support into the kernel, the ppp we’re talking about is a program written by Toshiharu OHNO and Brian SOMERS in the early 1990s, and part of BSD since FreeBSD and OpenBSD 2. It’s the normal straightforward way of doing things.

ppp has a simple config file in /etc/ppp/ppp.conf. It can contain profiles for multiple services in sections, with the service name being arbitrary, and ending in a colon (“:”). You specify the service when you run it, and stuff in other sections is ignored. This is a hangover from the days when people had multiple dial-up connections.

Here’s a service definition for Cloudscape, one of my favourite ISPs, but other UK FTTP services will be similar or identical. UK FTTC and SoGEA modems are pretty much the same too.

cloudscape:
  delete default                # May already have a
                                # default route configured elsewhere
  set device PPPoE:bge1
  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP
  set dial
  set login
  set lcp
  set mru 1492
  set mtu 1492
  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #   (detect dropped line)
  enable echo                 # Enable echo for LQR
  iface name wan0
  add default HISADDR

The ppp program was originally used for serial PPP connections to dial-up ISPs or organisations, but here we’re just using it for PPPoE. In support of switching ISPs it can add stuff to config files like resolv.conf and the routing table, which in the old days tended to be dynamic.

Feel free to read the manual that explains what the options above do, but briefly I’m starting by deleting the default route, which probably won’t exist unless you’ve configured it (possibly using DHCP), but if it does will cause problems when ppp adds another.

  set device PPPoE:bge1

This says we’re using PPPoE over the bge1 Ethernet card. Obviously set this to the Ethernet card to which your Ethernet Modem (e.g. ONT) is attached.

  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP

This is the user-name and password supplied by your ISP. These tend to be low security, but are needed for the protocol for historic reasons.

  set dial
  set login
  set lcp

This will cause ppp to dial, log in and get details using LCP. Some people will try to tell you that internet lines are configured with DHCP – that’s for LANs. LCP (Link Control Protocol) provides the same function, such as what your IP address is and which DNS servers to use, over a point-to-point connection.

  set mru 1492
  set mtu 1492

There are eight bytes of protocol data added to every standard 1500 byte Ethernet frame so won’t fit 1:1 with a PPPoE packet. Reducing the MTU to 1492 gets around this and avoids fragmentation, which is a good thing. LCP might suggest or force a lower MTU but there’s no harm in specifying it.

  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #  (detect dropped line)
  enable echo                 # Enable echo for LQR
Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

This disables IPv6 and enables IPv4 (which is on by default anyway). If you want to use IPv6 your service provider needs to support it, and most don’t.

LQR is probably not going to be necessary for our purposes and generates warnings, so I’ve left the line in but commented it out for now. The enable echo therefore has no effect.

  iface name wan0

By default, ppp will name its connections as tun0, tun1 and so on (tun being Tunnel). This means that you never know what the interface is going to be called, as other tunnels may exist before you start this one. We’re going to be referring to the interface in the PF firewall, so it helps to be sure what its name will be. The line above sets the name manually, and I’ve called in wan0, which is logical. You may, of course, have multiple WAN connections including dial-up backups, so giving them a sensible name is, er, sensible. You can call it anything you like if you’re nuts.

  add default HISADDR

This is an example of ppp messing with your system configuration – in this case it’s taking the IP address supplied by LCP, represented by the macro HISADDR, and adding it as the default route. If you have a static IP address you might want to set it statically in the normal way.

Likewise, if you add the line “enable dns” it will take the DNS servers offered by LCP and add them to resolv.conf. It won’t remove them, and may well end up messing up whatever local DNS arrangements you have, so I prefer to do this manually.

Once you’ve edited ppp.conf you can test it out interactively with “ppp cloudscape” and see what happens. Type “dial” and it should make the connection, and wan0 should appear in your list of network interfaces. Use netstat -r to see if the new default route has appeared.

Setting up the pf firewall

ppp-user is a large program that tries to do everything, including NAT and being a firewall. This isn’t very UNIX-like in philosophy, but you can use these facilities if you like. I prefer to have a dedicated standard firewall, PF, and leave that to do everything firewall-like.

If you’re setting up a router you’re probably going to need asymmetric NAT. Your /etc/pf.conf file will look something like this:

scrub in all
WAN=wan0
WANIP=1.2.3.4
nat pass on $WAN from 192.168.1.0/24 to any -> $WANIP
#rdr pass on $WAN proto tcp from any to $WANIP port 80 -> 192.168.1.123

The WAN IP comes from your ISP, although you will be able to see it using “ifconfig wan0:” if you don’t have it. I’m assuming your LAN is 192.168.1.0/24 – just set this to whatever you’re using. And that’s about it.

As a bonus, the commented out example line at the end would external port 80 to a web server on LAN address 192.168.1.123 – an open port. Peter Hansteen has written an excellent book on PF, called “The Book of PF”, which will tell you everything you need to know, and it’s well documented in various online handbooks and man pages, unlike ppp-user’s built in firewall.

The only reason for using user-ppp for NAT is if you’re on a dynamic IP address, in which case and “enable nat” and add ppp_nat=yes to /etc/rc.conf

Kicking it all off

First you need to enable routing:

 sysctl net.inet.ip.forwarding=1

This will work until reboot, and you can turn it off again by setting it to zero if something bad happens, like your NIC catching fire. Then dial your ISP (Cloudscape in this example)

ppp -ddial cloudscape

You should now have a connection to the Internet on the BSD box. Now enable PF for NAT.

service pf start (or onestart)

Of it it’s running, use “service pf reload” to load the new config. At this point every machine on the LAN should be able to use your LAN IP address as a gateway.

When you’re happy it works, to make this kick off automatically on boot, modify /etc/rc.conf:

sysrc ppp_enable=yes
sysrc ppp_mode=ddial
sysrc ppp_profile="cloudscape"
sysrc pf_enable=yes
sysrc gateway_enable=yes

Optionally “sysrc ppp_nat=yes” if you’re not using PF for NAT. Or if you’re editing rc.conf directly:

pf_enable=yes
gateway_enable=yes

ppp_enable="YES"
ppp_mode="ddial"
#ppp_nat="YES"	# We let PF do NAT
ppp_profile="name_of_service_provider"

I will do a part two to this post explaining how to configure DNS and DHCP, although there’s no reason these need to be on the same host you’re using as a router. In fact it’s good practice to separate them and have more than one DHCP and DNS server if you have the resources.

I hope you found it useful – any questions add a comment below.

How to tell if a host is up without ping

Some people seem to think that disabling network pings (ICMP echo requests to be exact) is a great security enhancement. If attackers can’t ping something they won’t know it’s there. It’s called Security through Obscurity and only a fool would live in this paradise.

But supposing you have something on your network that disables pings and you, as the administrator, want to know if it’s up? My favourite method is to send an ARP packet to the IP address in question, and you’ll get a response.

ARP is how you translate an IP address into a MAC address to get the Ethernet packet to the right host. If you want to send an Ethernet packet to 1.2.3.4 you put out an ARP request “Hi, if you’re 1.2.3.4 please send your MAC address to my MAC address”. If a device doesn’t respond to this then it can’t be on an Ethernet network with an IP address at all.

You can quickly write a program to do this in ‘C’, but you can also do it using a shell script, and here’s a proof of concept.

#!/bin/sh
! test -n "$1" && echo $0: Missing hostname/IP && exit
#arp -d $1  >/dev/null 2>/dev/null
ping -t 1 -c 1 -q $1 >/dev/null
arp $1 | grep -q "expires in" && echo $1 is up. && exit
echo $1 is down.

You run this with a single argument (hostname or IP address) and it will print out whether it is down or up.

The first line is simply the shell needed to run the script.

Line 2 bails out if you forget to add an argument.

Line 3, which is commented out, deletes the host from the ARP cache if it’s already there. This probably isn’t necessary in reality, and you need to be root user to do it. IP address mappings are typically deleted after 20 minutes, but as we’re about to initiate a connection in line 4 it’ll be refreshed anyway.

Line 4 sends a ping to the host. We don’t care if it replies. The timeout is set to the minimum 1 second, which means there’s a one second delay if it doesn’t reply. Other ways of tricking the host into replying exist, but every system has ping, so ping it is here.

Live 5 will print <hostname> is up if there is a valid ARP cache entry, which can be determined by the presence of “expires in” in the output. Adjust as necessary.

The last line, if still running, prints <hostname> is down. Obviously.

This only works across Ethernet – you can’t get an ARP resolution on a different network (i.e. once the traffic has got through a router). But if you’re on your organisation’s LAN and looking to see if an IoT devices is offline, lost or stolen then this is a quick way to poll it and check.

Why can’t I ping my Amazon Echo?

The simple answer is that the current Amazon Echo devices don’t respond to a ping – or technically an ICMP echo request. There’s a lot of waffle on the web saying this is because they’re too simple to do it, but this isn’t the case. The original Echo (at least before software updates) and the Echo Show 8” most certainly did respond to a ping, but the functionality has been dropped since then. Some people naively think that it’s a security risk, part of a doctrine known as Security Through Obscurity. As it’s easy enough to find an Echo without a ping, it’s only a slight inconvenience to a would-be attacker and a big inconvenience to an network administrator.

Most later Echos do have open ports, however, so you can check to see if it’s alive because the port will be there. I emphasise “open”, as Echos use quite a lot of ports that aren’t always open, for things like setup or communicating out. But these ports are open and can be connected to – even if the connection is refused it shows there’s something there to refuse it.

Based on my incomplete collection of Echo devices, they have the following characteristics:

ModelPing?Ports
Original Echo
Echo Dot fourth Generation1080, 6543, 8888
Echo Flex1080, 8888
Echo Dot Second Generation1080, 8888
Echo Dot Third Generation1080, 8888
Echo Show 8-inch (second generation)Y8009
Echo Spot first Generation
Echo Show 5-inch

So how can you reliably tell if your Amazon Echo device is alive on the network? Rather than messing around with ports, my favorite way is to send it an ethernet ARP request and see if you get a reply. I did say disabling ping was a fools solution to security.

See here for how to do this.

How to improve Sage network performance

If you accept that Sage Line 50 is fundamentally flawed when working over a network you’re not left with many options other than waiting for Sage to fix it. All you can do is throw hardware at it. But what hardware actually works?

First the bad news – the difference in speed between a standard server and a turbo-nutter-bastard model isn’t actually that great. If you’re lucky, on a straight run you might get a four-times improvement from a user’s perspective. The reason for spending lots of money on a server has little to do with the speed a user’s sees; it’s much more to do with the number of concurrent users.

So, if you happen to have a really duff server and you throw lots of money at a new one you might see something that took a totally unacceptable 90 minutes now taking a totally unacceptable 20 minutes. If you spend a lot of money, and you’re lucky.

The fact is that on analysing the server side of this equation I’ve yet to see the server itself struggling with CPU time, or running out of memory or any anything else to suggest that it’s the problem. With the most problematic client they started with a Dual Core processor and 512Mb of RAM – a reasonable specification for a few years back. At no time did I see issues to do with the memory size and the processor utilisation was only a few percent on one of the cores.

I’d go as far as to say that the only reason for upgrading the server is to allow multiple users to access it on terminal server sessions, bypassing the network access to the Sage files completely. However, whilst this gives the fastest possible access to the data on the disk, it doesn’t overcome the architectural problems involved with sharing a disk file, so multiple users are going to have problems regardless. They’ll still clash, but when they’re not clashing it will be faster.

But, assuming want to run Line 50 multi-user the way it was intended, installing the software on the client PCs, you’re going to have to look away from the server itself to find a solution.

The next thing Sage will tell you is to upgrade to 1Gb Ethernet – it’s ten times faster than 100Mb, so you’ll get a 1000% performance boost. Yeah, right!

It’s true that the network file access is the bottleneck, but it’s not the raw speed that matters.

I’ll let you into a secret: not all network cards are the same.

They might communicate at a line speed of 100Mb, but this does not mean that the computer can process data at that speed, and it does not mean it will pass through the switch at that speed. This is even more true at 1Gb.

This week at Infosec I’ve been looking at some 10Gb network cards that really can do the job – communicate at full speed without dropping packets and pre-sort the data so a multi-CPU box could make sense of it. They cost $10,000 each. They’re probably worth it.

Have you any idea what kind of network card came built in to the motherboard of your cheap-and-cheerful Dell? I thought not! But I bet it wasn’t the high-end type though.

The next thing you’ve got to worry about is the cable. There’s no point looking at the wires themselves or what the LAN card says it’s doing. You’ll never know. Testing a cable has the right wires on the right pins is not going to tell you what it’s going to do when you put data down it at high speeds. Unless the cable’s perfect its going to pick up interference to some extent; most likely from the wire running right next to it. But you’ll never know how much this is affecting performance. The wonder of modern networking means that errors on the line are corrected automatically without worrying the user about it. If 50% of your data gets corrupted and needs re-transmission, by the time you’ve waited for the error to be detected, the replacement requested, the intervening data to be put on hold and so on your 100Mb line could easily be clogged with 90% junk – but the line speed will still be saying 100Mb with minimal utilisation.

Testing network cables properly requires some really expensive equipment, and the only way around it is to have the cabling installed by someone who really knows what they’re doing with high-frequency cable to reduce the likelihood of trouble. If you can, hire some proper test gear anyway. What you don’t want to do is let an electrician wire it up for you in a simplistic way. They all think they can, but believe me, they can’t.

Next down the line is the network switch and this could be the biggest problem you’ve got. Switches sold to small business are designed to be ignored, and people ignore them. “Plug and Play”.

You’d be forgiven for thinking that there wasn’t much to a switch, but in reality it’s got a critical job, which it may or may not do very well in all circumstances. When it receives a packet (sequence of data, a message from one PC to another) on one of its ports it has to decide which port to send it out of to reach its intended destination. If it receives multiple packets on multiple ports it has handle them all at once. Or one at a time. Or give up and ask most of the senders to try again later.

What your switch is doing is probably a mystery, as most small businesses use unmanaged “intelligent” switches. A managed switch, on the other hand, lets you connect to it using a web browser and actually see what’s going on. You can also configure it to give more priority to certain ports, protect the network from “packet storms” caused by accident or malicious software and generally debug poorly performing networks. This isn’t intended to be a tutorial on managed switches; just take it from me that in the right hands they can be used to help the situation a lot.

Unfortunately, managed switches cost a lot more than the standard variety. But they’re intended for the big boys to play with, and consequently they tend to switch more simultaneous packets and stand up to heavier loads.

Several weeks back I upgraded the site with the most problems from good quality standard switches to some nice expensive managed ones, and guess what? It’s made a big difference. My idea was partly to use the switch to snoop on the traffic and figure out what was going on, but as a bonus it appears to have improved performance, and most importantly, reliability considerably too.

If you’re going to try this, connect the server directly to the switch at 1Gb. It doesn’t appear to make a great deal of difference whether the client PCs are 100Mb or 1Gb, possibly due to the cheapo network interfaces they have, but if you have multiple clients connected to the switch at 100Mb they can all simultaneously access the server down the 1Gb pipe at full speed (to them).

This is a long way from a solution, and it’s hardly been conclusively tested, but the extra reliability and resilience of the network has, at least allow a Sage system to run without crashing and corrupting data all the time.

If you’re using reasonably okay workstations and a file server, my advice (at present) is to look at the switch first, before spending money on anything else.

Then there’s the nuclear option, which actually works. Don’t bother trying to run the reports in Sage itself. Instead dump the data to a proper database and use Crystal Reports (or the generator of your choice) to produce them. I know someone who was tearing their hair out because a Sage report took three hours to run; the same report took less than five minutes using Crustal Reports. The strategy is to dump the data overnight and knock yourself out running reports the following day. Okay, the data may be a day old but if it’s taking most of the day to run the report on the last data, what have you really lost?

I’d be really interested to hear how other people get on.