Network – Frank Leonhardt's Blog

FreeBSD/Linux as Fibre Broadband router Part 2

In part one I described how to set up PPP and the pf firewall to provide NAT with port forwarding and other good things. In Part 2 I’ll add DCHP, and as a bonus I’ll add configuration for an IP address blockfor if you have that kind of ISP. If you want that kind of ISP but can’t find one, I can point at a few that do. In Part 3 I’ll cover DNS and BIND.

DHCP

There’s never been a DHCP server in the FreeBSD base, but it’s installed easily by compiling the port or installing the package. Your best bet for FreeBSD is the DHCP daemon written for OpenBSD, AKA the ISC dhcpd. But beware – the OpenBSD one, although called version 6.6, lags behind the other package isc-dhcp44 as it doesn’t have support peer servers. If you’ve only got one DHCP server on your network, it’s fine. If you want to have primary and secondary servers, or load balance them, look at the latest ISC one instead. I’ll deal with that in another post.

pkg install dhcpd

Before you kick it off you really ought to edit the configuration file, /usr/local/etc/dhcpd.conf. There’s usually a second copy of it postfixed with .sample, and it’s pretty self documenting. I’m posting the basics from a real configuration, which I shall annotated to death. But first, something about the network we’re defining:

I’m going to have a LAN with 192.168.1.0/24 – which means IP addresses in the range 192.168.1.1 to 192.168.1.254. This isn’t a tutorial on routing – just leave the first and last address (0 and 255) alone for now. The network will have a domain. This is optional, but if you’re doing your own DNS you’ll want one. You don’t have to register this domain externally – you can make it up (please end it in .local!) – but let’s assume you have a real one: “example.com”. You’ve created an subdomain for this site called mysite.example.com and it has an A record to prove it, and you’ll probably want to delegate the DNS to it later. But if you’re not worried about domain names, don’t worry about all of this.

The router (i.e. the FreeBSD box) is going to be on 192.168.1.2, which is set up in rc.conf. It can’t be assigned automatically by DHCP because, well, we’re also the DHCP server and that would be silly.

Assuming your LAN-side network interface is bge0 (remember the modem is on bge1 in Part 1) the following line would do it:

sysrc ifconfig_bge0="inet 192.168.1.2 netmask 255.255.255.0"

Obviously change bge0 to the name of your actual Ethernet interface! You might wonder why I’m putting the router on 192.168.1.2 instead of 192.168.1.1, which is a common convention. It’s simple: There are so many home user network appliances that come with 192.168.1.1 as their default IP address, and if you plug one in to your LAN the clash will cause merry hell before you’ve been able to go to their web interface to configure it to something else.

I want some devices to have a fixed IP address supplied by DHCP, and other things to have dynamically allocated ones – friends using the guest WiFi, for example. Having network infrastructure like switches and WAPs on a static addressed, defined by DHCP, is a good way to go. Connecting network printers to Windoze is smoother if they’re on a fixed IP too. But going around and setting it on each device is a pain, so do it by DHCP where it’s defined in one place and can be managed in one place. It works by recognising the MAC address in the request and giving back whatever IP address you have chosen.

As a final tip, keep your network address plan as comments in dhcpd.conf – it’s where you want the information anyway. And with that, here’s the sample file:


# This is the domain name that will be supplied to everything on
# the LAN by default. This is the domain that will be searched if you
# enter a host name. For example, if you want to connect to "fred-pc" it
# will look for it as fred-pc.mysite.example.com, which if you have
# your DNS set up correctly, will find it quickly.
option domain-name "mysite.example.com";

# This specifies the DNS server(s) the machines on the LAN
# will use. We're specifying the same as the router, because
# we'll be running DNS there. If you don't want to, just use the
# IP address of DNS server supplied by your ISP.
option domain-name-servers 192.168.1.2;

# These just specify the time a machine on the LAN gets to hold
# on to a dynamic address before it needs to renew it.
default-lease-time 43200;
max-lease-time 86400;

# This defines our pool of dynamically allocated addresses,
# and I've chosen the range 100..199. Options here override the
# options above (outside the {...}) in the way you might expect.
# I've set the default lease time to 900 seconds (15 minutes)
# for testing purposes only. 2h is normal but it's up to you.
# I normally go for 12h.

subnet 192.168.1.0 netmask 255.255.255.0 {
  range 192.168.1.100 192.168.1.199;
  option broadcast-address 192.168.1.255;
  default-lease-time 900;
}

# The next block is assigning a fixed IP address to 
# a switch, because I don't want it to move. This just needs the 
# MAC address of the device and the fixed-address you want to give it.
# You can have as many of these as you like. The name "switch1" is really
# just for your own reference.

host switch1 {
        hardware ethernet 00:02:FC:CB:1E:7D;
        fixed-address 192.168.1.3;
}

For more information see this post about assigning names, and the dhcpd.conf.sample, which has scenarios far more complex than you’ll need on a simple LAN.

Enable it on reboot with:

sysrc dchpd_enable=yes

You can then start it manually with service dhcpd start.

If you want to make changes to dhcpd.conf you can at any time, but they won’t take effect until you restart dhcpd (with service dhcpd restart). There’s no way of having it just do a reload. Details of the leases it has issued are /var/db/dhcpd.leases, which is just a text file and you can easily read it.

Routing a whole subnet

Supposing you have more than one IP address coming down the PPPoE tunnel at you? This is a service you can get from your ISP, giving you multiple IP addresses for various purposes – such as running servers. Other ISPs give you a single dynamic address, or worse, an IP address generated by CG-NAT. I’d argue this ceases to meet the definition of “Internet Service” at this point.

But assuming you have a block of static addresses, how do you get ppp to use them? I haven’t seen this documented ANYWHERE and figuring it out involved a great deal of trial and error. Shout out to shurik for encouraging me to keep going where ppp.linkup was concerned.

The easy way to add an alias to your tunnel (which you’ll recall we called wan0) is to use ifconfig and simply add it. But the trick with tunnels is to add the alias IP address and the remote tunnel address (i.e. HISADDR). You can find out what HISADDR is using ifconfig:

# ifconfig wan0
wan0: flags=1008051<UP,POINTOPOINT,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 1492
        options=80000<LINKSTATE>
        inet 1.2.3.4 --> 44.33.22.11 netmask 0xffffffff
        groups: tun
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        Opened by PID 658

In the output above, 1.2.3.4 is the IP address supplied by LCP – i.e. your public IP address. 44.33.22.11 is the IP address of the other end of the tunnel. In the parlance of the PPP utility, HISADDR. Earlier we set the default route to HISADDR. There are good reasons why HISADDR is dynamic, not least of which is having a pool of gateways for redundancy, so you have to check what it actually IS today before you assign an alias public address to the tunnel.

Then it’s a simple matter of adding further addresses using ifconfig:

ifconfig wan0 alias 1.2.3.41/32 44.33.22.11

Yes, it’s not quote the same format as adding an alias to an Ethernet interface, as the remote address follows the local one.

You can write a little script to do them automatically:

#!/bin/sh
HISADDR=$(ifconfig wan0 | grep "inet 1.2.3.4" | cut -w -f 5)
ALIASES="1.2.3.41 1.2.3.42 1.2.3.43 1.2.3.44"
for a in $ALIASES
do
    ifconfig wan0 delete $a
    ifconfig wan0 alias $a/32 $HISADDR
done

Note that I’m using grep to find the correct inet address based on the static address I know the interface has. Fiddle this to suit your static address, or if you don’t have one, grep for inet and hope the first it finds is correct. I’m also deleting the old aliases as they might need to be recreated using the new HISADDR.

This is all well and good, but when do you run the script? Automating it is the trick. Fortunately there’s a hook in ppp, where it processes the file /etc/ppp/ppp.linkup when the link comes up. As far as I can tell it’s the same format as ppp.conf, and you have to label the service name in the same way. What’s not documented is how you add alias addresses, but I’ve found a way by getting it to run ipconfig for you. If you start a line with ” !bg “, what follows is run. It’s run without an environment so you have to specify all paths to whatever you want to run in full, but it does work and does expand macros like HISADDR. The space in front of the ! is important! Incidentally, there’s also a ppp.linedown.

Here’s my /etc/ppp/ppp.linkup

cloudscape:
  !bg /sbin/ifconfig wan0 alias 1.2.3.40/32 HISADDR
  !bg /sbin/ifconfig wan0 alias 1.2.3.41/32 HISADDR
  !bg /sbin/ifconfig wan0 alias 1.2.3.42/32 HISADDR
  !bg /sbin/ifconfig wan0 alias 1.2.3.43/32 HISADDR

I would very much like to find the documentation for this, but the author (Brian Somer) has moved on to other things and the documentation that’s out there appears to be all there is. It was written for dial-up connections and wasn’t really designed for fixed lines with multiple public IP addresses.

Meanwhile the other PPP demon, mpd5, which is supposed to be better, was listed in the FreeBSD Handbook as being for PPPoA, pushing user-ppp for PPPoE. This isn’t actually the case, and I may be revisiting this using mpd5 at some point because it’s faster and more efficient, and I don’t need all the extra wonderful NAT and firewall features of user-ppp.

21-July-256-October-25

FreeBSD/Linux as Fibre Broadband router

British Telecom, bless them, has decided that copper telephone lines have to go and is forcing everyone onto fibre Internet and VoIP. Except rural customers currently connected to the Internet using a wet piece of string if they’re lucky, of course.

Incidentally, “Fibre Broadband” is a nonsense in a technical sense but the battle is lost – the public believes Broadband is any Internet connection to the home that isn’t dial-up.

Although I’ve written about routing on FreeBSD before, I thought it was time for an update. Why route on FreeBSD? Because unlike the cheap and nasty “routers” supplied by domestic (and some commercial) ISPs, it doesn’t crash. You don’t have to turn it off and on again. And it does what it’s told, with great diagnostics. You can also run plenty of other services on the same box if it’s powerful enough, or your throughput is modest.

Most of this should work fine on Linux, although the networking is generally considered less efficient than the real thing. However, at less than 1Gbps on a single line this isn’t going to matter, if it matters at all. With Linux you get less of the nuts and bolts built in to the base system so you may have to install extra packages depending on which distribution you are using. But this is all standard stuff so shouldn’t be too difficult. It’s the settings that matter, and probably the reason you’re reading this!

In this first article I’ll just consider a gateway router with NAT, and leave DNS, DHCP and other options until later.

Setting up PPPoE using user-ppp

First off, your WAN connection. With FTTC and FTTP this is normally a little white box – either a VDSL modem or an ONT. It connects to the phone line or fibre cable on one end, and has an RJ45 on the other that looks like Ethernet, because it is Ethernet. I’m going to call them Ethernet Modems, as they’re treated the same for our purpose. However, being Ethernet won’t do you much good as it’s just talking a protocol called PPPoE – or Point-to-Point Protocol over Ethernet.

PPP is an old protocol for making an Internet connection using dial-up, but it’s evolved (or suffered mission creep) and it’s now rather complicated thanks to all the baggage. Fortunately you can ignore the baggage and concentrate on the PPPoE stuff, once you know which is which. And that’s always the trick.

You’ll need a host (i.e. computer) with two Ethernet ports unless you want a complicated life. If you’re using an old PC with just one you can get away with a USB3 Ethernet adapter, but having a couple of server-grade NICs on the motherboard or add-on cards is the best way to go. Very generally, Intel or Broadcom are good choices, Realtek is at the low end.

You need to connect your Ethernet Modem to one port on your host and the other port goes to the LAN.

If you Ethernet Modem and the host you’re planning to use as a router are in different places you can connect them using a VLAN. It’s proper Ethernet and can be switched. Without a VLAN it’s not so simple, so plug it in using a direct cable.

PPP is built in (to FreeBSD etc) in the base system. Type ppp (as root) and it’ll start up in interactive mode. If it doesn’t, you’re not using BSD and therefore lack a base system and will have to install it as a package. You might like to start here: https://tldp.org/HOWTO/PPP-HOWTO/

Although you can compile PPP support into the kernel, the ppp we’re talking about is a program written by Toshiharu OHNO and Brian SOMERS in the early 1990s, and part of BSD since FreeBSD and OpenBSD 2. It’s the normal straightforward way of doing things.

ppp has a simple config file in /etc/ppp/ppp.conf. It can contain profiles for multiple services in sections, with the service name being arbitrary, and ending in a colon (“:”). You specify the service when you run it, and stuff in other sections is ignored. This is a hangover from the days when people had multiple dial-up connections.

Here’s a service definition for Cloudscape, one of my favourite ISPs, but other UK FTTP services will be similar or identical. UK FTTC and SoGEA modems are pretty much the same too.

cloudscape:
  delete default                # May already have a
                                # default route configured elsewhere
  set device PPPoE:bge1
  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP
  set dial
  set login
  set lcp
  set mru 1492
  set mtu 1492
  nat enable no               # Turn of NAT explicitly!
  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #   (detect dropped line)
  enable echo                 # Enable echo for LQR
  iface name wan0
  add default HISADDR

The ppp program was originally used for serial PPP connections to dial-up ISPs or organisations, but here we’re just using it for PPPoE. In support of switching ISPs it can add stuff to config files like resolv.conf and the routing table, which in the old days tended to be dynamic.

Feel free to read the manual that explains what the options above do, but briefly I’m starting by deleting the default route, which probably won’t exist unless you’ve configured it (possibly using DHCP), but if it does will cause problems when ppp adds another.

  set device PPPoE:bge1

This says we’re using PPPoE over the bge1 Ethernet card. Obviously set this to the Ethernet card to which your Ethernet Modem (e.g. ONT) is attached.

  set authname user-name-supplied-by-ISP
  set authkey password-supplied-by-ISP

This is the user-name and password supplied by your ISP. These tend to be low security, but are needed for the protocol for historic reasons.

  set dial
  set login
  set lcp

This will cause ppp to dial, log in and get details using LCP. Some people will try to tell you that internet lines are configured with DHCP – that’s for LANs. LCP (Link Control Protocol) provides the same function, such as what your IP address is and which DNS servers to use, over a point-to-point connection.

  set mru 1492
  set mtu 1492

There are eight bytes of protocol data added to every standard 1500 byte Ethernet frame so won’t fit 1:1 with a PPPoE packet. Reducing the MTU to 1492 gets around this and avoids fragmentation, which is a good thing. LCP might suggest or force a lower MTU but there’s no harm in specifying it.

    nat enable no

If you’re running on a single IP address, or you don’t mind all of your source addresses being the same IP, you can leave this line off. It shouldn’t be necessary anyway. According to the manual you need to enable ppp to do NAT if you want it. But some bright spark has added the “-nat” option to the FreeBSD service management scripts so you get it by default, whether you asked for it or not. This has the effect of making every source address leaving the tunnel map to the first address configured, leading to some interesting anomalies when you make an outgoing connection. It doesn’t affect incoming address mapping, probably because Brian Somers didn’t envisage a whole subnet using it back in 1998.

  disable ipv6cp              # Turn off IPv6
  enable ipcp                 # Turn on IPv4 (default)
#  enable lqr                 # Turn on Link Quality Requests
                              #  (detect dropped line)
  enable echo                 # Enable echo for LQR

This disables IPv6 and enables IPv4 (which is on by default anyway). If you want to use IPv6 your service provider needs to support it, and most don’t.

LQR is probably not going to be necessary for our purposes and generates warnings, so I’ve left the line in but commented it out for now. The enable echo therefore has no effect.

  iface name wan0

By default, ppp will name its connections as tun0, tun1 and so on (tun being Tunnel). This means that you never know what the interface is going to be called, as other tunnels may exist before you start this one. We’re going to be referring to the interface in the PF firewall, so it helps to be sure what its name will be. The line above sets the name manually, and I’ve called in wan0, which is logical. You may, of course, have multiple WAN connections including dial-up backups, so giving them a sensible name is, er, sensible. You can call it anything you like if you’re nuts.

  add default HISADDR

This is an example of ppp messing with your system configuration – in this case it’s taking the IP address supplied by LCP, represented by the macro HISADDR, and adding it as the default route. If you have a static IP address you might want to set it statically in the normal way.

Likewise, if you add the line “enable dns” it will take the DNS servers offered by LCP and add them to resolv.conf. It won’t remove them, and may well end up messing up whatever local DNS arrangements you have, so I prefer to do this manually.

Once you’ve edited ppp.conf you can test it out interactively with “ppp cloudscape” and see what happens. Type “dial” and it should make the connection, and wan0 should appear in your list of network interfaces. Use netstat -r to see if the new default route has appeared.

Setting up the pf firewall

ppp-user is a large program that tries to do everything, including NAT and being a firewall. This isn’t very UNIX-like in philosophy, but you can use these facilities if you like. I prefer to have a dedicated standard firewall, PF, and leave that to do everything firewall-like in one place.

If you’re setting up a router you’re probably going to need asymmetric NAT. Your /etc/pf.conf file will look something like this:

scrub in all
WAN=wan0
WANIP=1.2.3.4
nat pass on $WAN from 192.168.1.0/24 to any -> $WANIP
#rdr pass on $WAN proto tcp from any to $WANIP port 80 -> 192.168.1.123

The WAN IP comes from your ISP, although you will be able to see it using “ifconfig wan0:” if you don’t have it. I’m assuming your LAN is 192.168.1.0/24 – just set this to whatever you’re using. And that’s about it.

As a bonus, the commented out example line at the end would external port 80 to a web server on LAN address 192.168.1.123 – an open port. Peter Hansteen has written an excellent book on PF, called “The Book of PF”, which will tell you everything you need to know, and it’s well documented in various online handbooks and man pages, unlike ppp-user’s built in firewall.

The only reason for using user-ppp for NAT is if you’re on a dynamic IP address, in which case and “enable nat” and add ppp_nat=yes to /etc/rc.conf

Kicking it all off

First you need to enable routing:

 sysctl net.inet.ip.forwarding=1

This will work until reboot, and you can turn it off again by setting it to zero if something bad happens, like your NIC catching fire. Then dial your ISP (Cloudscape in this example)

ppp -ddial cloudscape

You should now have a connection to the Internet on the BSD box. Now enable PF for NAT.

service pf start (or onestart)

Of it it’s running, use “service pf reload” to load the new config. At this point every machine on the LAN should be able to use your LAN IP address as a gateway.

When you’re happy it works, to make this kick off automatically on boot, modify /etc/rc.conf:

sysrc ppp_enable=yes
sysrc ppp_mode=ddial
sysrc ppp_profile="cloudscape"
sysrc pf_enable=yes
sysrc gateway_enable=yes

Optionally “sysrc ppp_nat=yes” if you’re not using PF for NAT. Or if you’re editing rc.conf directly:

pf_enable=yes
gateway_enable=yes

ppp_enable="YES"
ppp_mode="ddial"
#ppp_nat="YES"	# We let PF do NAT
ppp_profile="name_of_service_provider"

I will do a part two to this post explaining how to configure DNS and DHCP, although there’s no reason these need to be on the same host you’re using as a router. In fact it’s good practice to separate them and have more than one DHCP and DNS server if you have the resources.

I hope you found it useful – any questions add a comment below.

(Updated 06-Oct-2025 when I discovered the FreeBSD service scripts added -nat to the ppp startup without telling anyone).

1-July-25

How to tell if a host is up without ping

Some people seem to think that disabling network pings (ICMP echo requests to be exact) is a great security enhancement. If attackers can’t ping something they won’t know it’s there. It’s called Security through Obscurity and only a fool would live in this paradise.

But supposing you have something on your network that disables pings and you, as the administrator, want to know if it’s up? My favourite method is to send an ARP packet to the IP address in question, and you’ll get a response.

ARP is how you translate an IP address into a MAC address to get the Ethernet packet to the right host. If you want to send an Ethernet packet to 1.2.3.4 you put out an ARP request “Hi, if you’re 1.2.3.4 please send your MAC address to my MAC address”. If a device doesn’t respond to this then it can’t be on an Ethernet network with an IP address at all.

You can quickly write a program to do this in ‘C’, but you can also do it using a shell script, and here’s a proof of concept.

#!/bin/sh
! test -n "$1" && echo $0: Missing hostname/IP && exit
#arp -d $1  >/dev/null 2>/dev/null
ping -t 1 -c 1 -q $1 >/dev/null
arp $1 | grep -q "expires in" && echo $1 is up. && exit
echo $1 is down.

You run this with a single argument (hostname or IP address) and it will print out whether it is down or up.

The first line is simply the shell needed to run the script.

Line 2 bails out if you forget to add an argument.

Line 3, which is commented out, deletes the host from the ARP cache if it’s already there. This probably isn’t necessary in reality, and you need to be root user to do it. IP address mappings are typically deleted after 20 minutes, but as we’re about to initiate a connection in line 4 it’ll be refreshed anyway.

Line 4 sends a ping to the host. We don’t care if it replies. The timeout is set to the minimum 1 second, which means there’s a one second delay if it doesn’t reply. Other ways of tricking the host into replying exist, but every system has ping, so ping it is here.

Live 5 will print <hostname> is up if there is a valid ARP cache entry, which can be determined by the presence of “expires in” in the output. Adjust as necessary.

The last line, if still running, prints <hostname> is down. Obviously.

This only works across Ethernet – you can’t get an ARP resolution on a different network (i.e. once the traffic has got through a router). But if you’re on your organisation’s LAN and looking to see if an IoT devices is offline, lost or stolen then this is a quick way to poll it and check.

1-July-251-July-25

Why can’t I ping my Amazon Echo?

The simple answer is that the current Amazon Echo devices don’t respond to a ping – or technically an ICMP echo request. There’s a lot of waffle on the web saying this is because they’re too simple to do it, but this isn’t the case. The original Echo (at least before software updates) and the Echo Show 8” most certainly did respond to a ping, but the functionality has been dropped since then. Some people naively think that it’s a security risk, part of a doctrine known as Security Through Obscurity. As it’s easy enough to find an Echo without a ping, it’s only a slight inconvenience to a would-be attacker and a big inconvenience to an network administrator.

Most later Echos do have open ports, however, so you can check to see if it’s alive because the port will be there. I emphasise “open”, as Echos use quite a lot of ports that aren’t always open, for things like setup or communicating out. But these ports are open and can be connected to – even if the connection is refused it shows there’s something there to refuse it.

Based on my incomplete collection of Echo devices, they have the following characteristics:

Model	Ping?	Ports
Original Echo	–	–
Echo Dot fourth Generation	–	1080, 6543, 8888
Echo Flex	–	1080, 8888
Echo Dot Second Generation	–	1080, 8888
Echo Dot Third Generation	–	1080, 8888
Echo Show 8-inch (second generation)	Y	8009
Echo Spot first Generation	–	–
Echo Show 5-inch	–

So how can you reliably tell if your Amazon Echo device is alive on the network? Rather than messing around with ports, my favorite way is to send it an ethernet ARP request and see if you get a reply. I did say disabling ping was a fools solution to security.

See here for how to do this.

30-April-102-November-23

How to improve Sage network performance

If you accept that Sage Line 50 is fundamentally flawed when working over a network you’re not left with many options other than waiting for Sage to fix it. All you can do is throw hardware at it. But what hardware actually works?

First the bad news – the difference in speed between a standard server and a turbo-nutter-bastard model isn’t actually that great. If you’re lucky, on a straight run you might get a four-times improvement from a user’s perspective. The reason for spending lots of money on a server has little to do with the speed a user’s sees; it’s much more to do with the number of concurrent users.

So, if you happen to have a really duff server and you throw lots of money at a new one you might see something that took a totally unacceptable 90 minutes now taking a totally unacceptable 20 minutes. If you spend a lot of money, and you’re lucky.

The fact is that on analysing the server side of this equation I’ve yet to see the server itself struggling with CPU time, or running out of memory or any anything else to suggest that it’s the problem. With the most problematic client they started with a Dual Core processor and 512Mb of RAM – a reasonable specification for a few years back. At no time did I see issues to do with the memory size and the processor utilisation was only a few percent on one of the cores.

I’d go as far as to say that the only reason for upgrading the server is to allow multiple users to access it on terminal server sessions, bypassing the network access to the Sage files completely. However, whilst this gives the fastest possible access to the data on the disk, it doesn’t overcome the architectural problems involved with sharing a disk file, so multiple users are going to have problems regardless. They’ll still clash, but when they’re not clashing it will be faster.

But, assuming want to run Line 50 multi-user the way it was intended, installing the software on the client PCs, you’re going to have to look away from the server itself to find a solution.

The next thing Sage will tell you is to upgrade to 1Gb Ethernet – it’s ten times faster than 100Mb, so you’ll get a 1000% performance boost. Yeah, right!

It’s true that the network file access is the bottleneck, but it’s not the raw speed that matters.

I’ll let you into a secret: not all network cards are the same.

They might communicate at a line speed of 100Mb, but this does not mean that the computer can process data at that speed, and it does not mean it will pass through the switch at that speed. This is even more true at 1Gb.

This week at Infosec I’ve been looking at some 10Gb network cards that really can do the job – communicate at full speed without dropping packets and pre-sort the data so a multi-CPU box could make sense of it. They cost $10,000 each. They’re probably worth it.

Have you any idea what kind of network card came built in to the motherboard of your cheap-and-cheerful Dell? I thought not! But I bet it wasn’t the high-end type though.

The next thing you’ve got to worry about is the cable. There’s no point looking at the wires themselves or what the LAN card says it’s doing. You’ll never know. Testing a cable has the right wires on the right pins is not going to tell you what it’s going to do when you put data down it at high speeds. Unless the cable’s perfect its going to pick up interference to some extent; most likely from the wire running right next to it. But you’ll never know how much this is affecting performance. The wonder of modern networking means that errors on the line are corrected automatically without worrying the user about it. If 50% of your data gets corrupted and needs re-transmission, by the time you’ve waited for the error to be detected, the replacement requested, the intervening data to be put on hold and so on your 100Mb line could easily be clogged with 90% junk – but the line speed will still be saying 100Mb with minimal utilisation.

Testing network cables properly requires some really expensive equipment, and the only way around it is to have the cabling installed by someone who really knows what they’re doing with high-frequency cable to reduce the likelihood of trouble. If you can, hire some proper test gear anyway. What you don’t want to do is let an electrician wire it up for you in a simplistic way. They all think they can, but believe me, they can’t.

Next down the line is the network switch and this could be the biggest problem you’ve got. Switches sold to small business are designed to be ignored, and people ignore them. “Plug and Play”.

You’d be forgiven for thinking that there wasn’t much to a switch, but in reality it’s got a critical job, which it may or may not do very well in all circumstances. When it receives a packet (sequence of data, a message from one PC to another) on one of its ports it has to decide which port to send it out of to reach its intended destination. If it receives multiple packets on multiple ports it has handle them all at once. Or one at a time. Or give up and ask most of the senders to try again later.

What your switch is doing is probably a mystery, as most small businesses use unmanaged “intelligent” switches. A managed switch, on the other hand, lets you connect to it using a web browser and actually see what’s going on. You can also configure it to give more priority to certain ports, protect the network from “packet storms” caused by accident or malicious software and generally debug poorly performing networks. This isn’t intended to be a tutorial on managed switches; just take it from me that in the right hands they can be used to help the situation a lot.

Unfortunately, managed switches cost a lot more than the standard variety. But they’re intended for the big boys to play with, and consequently they tend to switch more simultaneous packets and stand up to heavier loads.

Several weeks back I upgraded the site with the most problems from good quality standard switches to some nice expensive managed ones, and guess what? It’s made a big difference. My idea was partly to use the switch to snoop on the traffic and figure out what was going on, but as a bonus it appears to have improved performance, and most importantly, reliability considerably too.

If you’re going to try this, connect the server directly to the switch at 1Gb. It doesn’t appear to make a great deal of difference whether the client PCs are 100Mb or 1Gb, possibly due to the cheapo network interfaces they have, but if you have multiple clients connected to the switch at 100Mb they can all simultaneously access the server down the 1Gb pipe at full speed (to them).

This is a long way from a solution, and it’s hardly been conclusively tested, but the extra reliability and resilience of the network has, at least allow a Sage system to run without crashing and corrupting data all the time.

If you’re using reasonably okay workstations and a file server, my advice (at present) is to look at the switch first, before spending money on anything else.

Then there’s the nuclear option, which actually works. Don’t bother trying to run the reports in Sage itself. Instead dump the data to a proper database and use Crystal Reports (or the generator of your choice) to produce them. I know someone who was tearing their hair out because a Sage report took three hours to run; the same report took less than five minutes using Crustal Reports. The strategy is to dump the data overnight and knock yourself out running reports the following day. Okay, the data may be a day old but if it’s taking most of the day to run the report on the last data, what have you really lost?

I’d be really interested to hear how other people get on.