Restoring cPanel backup to system without cPanel

cPanel is a web front end for “reseller” hosting accounts, and it’s very popular with web designers reselling hosting services. It’s very simple to use, and allows the web designers to set up virtual hosting accounts without giving them any real control over the server – self-service and fool proof. It’s also an expensive thing to license. It makes sense for a self-service low-cost hosting provider, where the customers do all the work, but for small-scale or “community” hosting providers you’re talking big money.

I’ve just had to rescue a number of web sites from a developer using one of these hosting services, and they’ve got a lot of sites. And the only access to the virtual server is through cPanel (and FTP to a home directory). I logged in to cPanel and there’s an option to create a backup of everything in one big tarball, and this looked like just what I wanted to get them all at once. However, it was designed to upload and unpack in another cPanel environment.

Getting out the home directories is pretty straightforward. They end up in a directory called “homedir”, and you just move it to where you want them – i.e. ~username/www/. But how about restoring the dump of the MySQL databases. Actually, that’s pretty simple too. They’re in a directory called “mysql”, but instead of it being one big dump, each is in it’s own file – and without the create commands, which are in another with the extension “.create” instead of “.sql”. Loading them all manually is going to be a time-wasting PITA, but I’ve worked out the the following shell script will do it for you if you run in while in the backup’s mysql directory:

for name in `find . -name “*.create”`; do
cat $name `echo $name | sed s/.create/.sql/` | mysql
done

You obviously have to be in the directory with the files (or edit find’s specification) and logged in as root (or add the root login as a parameter to the mysql utility).

You’ll also want to set the user/password combination on these. The tarball will have a file called mysql.sql in its root directory. Just feed it in thus:

mysql < mysql.sql

Please be aware that I figured this out looking at the files in the dump and NOT by reading any magic documentation. It works on the version of cPanel I encountered, and I was restoring to FreeBSD. By all means add a comment if you have a different experience when you try it, and don’t go this way if you’re not sure how to operate a MySQL database or you could do a lot of damage!

The final hurdle is configuring Apache for all these new sites. cPanel creates a directory in the dump called “userdata”, and this seems to contain a file with information about each web site. I decided to automate and wrote the following script:


#!/bin/sh

# Convert cPanel dump of "userdata" in to a series of Apache .conf files
# (c) F J Leonhardt 17 April 2014 - www.fjl.co.uk
# You may use this script for your own purposes, but must not distribute it without the copyright message above left intact

# Directory to write config files
# Normally /usr/local/etc/apache22/Include but you might want to write
# them somewhere else to check them first!

confdir=/usr/local/etc/apache22/Include

# oldhome and newhome are the old and new home directories (where the web sites are stored
# oldtestname and newtestname are used (together with a sub-domain) to implement test web sites before
# they have a real domain name pointed at them. They will be substituted in server names and aliases

oldhome=/data03/exampleuser/public_html
newhome=/home/exampleuser/www
oldtestname=exampleuser.oldisp.co.uk
newtestname=newuser.fjl.org.uk

# Now some static information to add to all virtual hosts
# vhost is the IP address or hostname you're using for virtual hosting (i.e. the actual name of the server)
# serveradmin is the email address of the server admin
# logfiles is the directory you want to put the log files in (assuming you're doing separate ones). If
# you do this you must uncomment the lines that write the .conf file

Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

vhost=web.exampleuser.com
serveradmin=yourname@example.com
logdir=/var/log

getvalue()
{
grep ^$1: $name | sed s!$1:\ !! | sed s!$oldtestname!$newtestname!
}

# Start of main loop We DO NOT want to process a special file in the directory called "main" so
# a check is made.

for name in `ls`; do
if [ "$name" != "main" ]
then
echo -n "Processing $name "

if grep ^servername: $name >>/dev/null
then

# First we get some info from the file

sitename=`getvalue servername`
serveralias=`getvalue serveralias`
documentroot=`getvalue documentroot`

# Below we're setting the .conf pathname based on the first part of the file name (up to the first '.')
# This assumes that the file names are in the form websitename.isp.test.domain.com
#
# If the sitename in the source file is actually the name of the site (rather than a test alias) use
# this instead with something like:
#
# Basically, you want to end up with $givensitename as something meaningful when you see it
#
#givensitename=$sitename

givensitename=`echo $name | cut -d \. -f1`

confname=$confdir/$givensitename.conf

echo to $confname

echo "" >$confname
echo -e \\tServerAdmin $serveradmin >>$confname
echo -e \\tServerName $sitename >>$confname
for aname in $serveralias; do
echo -e \\tServerAlias $aname >>$confname
done
echo -e \\tDocumentRoot `echo $documentroot | sed s!$oldhome!$newhome!` >>$confname
echo -e \\tErrorLog $logdir/$givensitename-error.log >>$confname
echo -e \\tCustomLog $logdir/$givensitename-access.log combined >>$confname
echo "
" >>$confname

#from check that servername present
else
echo "- ignoring file - no servername therefore wrong format?"
fi

#fi from check it wasn't called "main"
fi
done

All of the above assumes you’re familiar with setting up virtual hosting on an Apache 2.2 http server in an UNIX-like environment. It’s just too complicated to explain that in a single blog post. Drop me a line if you need assistance.

Email addresses used by comment spammers on WordPress

On studying the behaviour of comment spammers I became interested in the email addresses they used. Were they genuine and where were they from? Well of course they’re not likely to be genuine, but it is possible to force them to register with an address if they want their comments to appear – even if they don’t. Here’s what I found:

When the spammers were required to register, these are the domain names they registered with:

Domain Percent
hotmail.com 25%
mailnesia.com 19%
Others (unique) 16%
gmail.com 7%
o2.pl 7%
outlook.com 5%
emailgratis.info 4%
gmx.com 2%
poczta.pl 2%
yahoo.com 2%
more-infos-about.com 1%
aol.com 1%
go2.pl 1%
katomcoupon.com 1%
tlen.pl 1%
acity.pl 1%
dispostable.com 1%
live.com 1%
mail.ru 1%
se.vot.pl 1%
acoustirack.com <1%
butala.htsail.pl <1%
cibags.com <1%
eiss.xoxi.pl <1%
justmailservice.info <1%
laposte.net <1%
pimpmystic.com <1%
twojewlasnem.pl <1%
wp.pl <1%

Where the authenticity of the address is more questionable, although the sample a lot larger, the figures are as follows:

Domain Percent
gmail.com 40%
yahoo.com 11%
Other (unique) 6%
hotmail.com 6%
aol.com 4%
ymail.com 2%
googlemail.com 2%
gawab.com 2%
bigstring.com 1%
zoho.com 1%
t-online.de 1%
inbox.com 1%
web.de 1%
yahoo.de 1%
arcor.de 1%
live.com 1%
freenet.de 1%
yahoo.co.uk 1%
comcast.net 1%
mail.com 1%
gmx.net 1%
gmx.de 1%
outlook.com <1%
live.cn <1%
hotmail.de <1%
msn.com <1%
livecam.edu <1%
google.com <1%
live.de <1%
rocketmail.com <1%
gmail.ocm <1%
wildmail.com <1%
moose-mail.com <1%
hotmail.co.uk <1%
care2.com <1%
certify4sure.com <1%
snail-mail.net <1%
1701host.com <1%
cwcom.net <1%
maill1.com <1%
wtchorn.com <1%
chinaadv.com <1%
noramedya.com <1%
o2.pl <1%
vegemail.com <1%
vp.pl <1%
24hrsofsales.com <1%
kitapsec.com <1%
peacemail.com <1%
whale-mail.com <1%
wp.pl <1%
aim.com <1%
animail.net <1%
bellsouth.net <1%
blogs.com <1%
email.it <1%
mailcatch.com <1%
rady24.waw.pl <1%
titmail.com <1%
fastemail.us <1%
btinternet.com <1%
harvard.edu <1%
onet.pl <1%
yahoo (various international) <1%
akogoto.org <1%
concorde.edu <1%
freenet.com <1%
leczycanie.pl <1%
mail15.com <1%
speakeasy.net <1%
yale.edu <1%
123inholland.co.nl <1%
SolicitorsWorld.com <1%
apemail.com <1%
buysellonline.in <1%
email.com <1%
help.com <1%
ipad2me.com <1%
ismailaga.org.tr <1%
live.fr <1%
myfastmail.com <1%
mymail.com <1%
ngn.si <1%
redpaintclub.co.uk <1%
stonewall42.plus.com <1%
traffic.seo <1%
xt.net.pl <1%
a0h.net <1%
accountant.com <1%
alphanewsroom.com <1%
att.net <1%
auctioneer.com <1%
brandupl.com <1%
canplay.info <1%
charter.net <1%
cluemail.com <1%
darkcloudpromotion.com <1%
earthlink.com <1%
earthlink.net <1%
eeemail.pl <1%
emailuser.net <1%
excite.com <1%
fastmail.net <1%
gmai.com <1%
gouv.fr <1%
h-mail.us <1%
hotmail.ca <1%
hotmailse.com <1%
hotmalez.com <1%
imajl.pl <1%
jmail.com <1%
juno.com <1%
live.co.uk <1%
mac.com <1%
mailandftp.com <1%
mailas.com <1%
mailbolt.com <1%
mailnew.com <1%
mailservice.ms <1%
modeperfect3.fr <1%
mymacmail.com <1%
nyc.gov <1%
op.pl <1%
peoplepc.com <1%
petml.com <1%
pornsex.com <1%
qwest.net <1%
rosefroze.com <1%
sbcglobal.net <1%
ssl-mail.com <1%
t-online.com <1%
thetrueonestop.com <1%
turk.net <1%
virgilio.it <1%
virginmedia.com <1%
windstream.net <1%
yaahoo.co.uk <1%
yahoo.com.my <1%
yazobo.com <1%
yopmail.com <1%
zol.com <1%

A few words of warning here. First, these figures are taken from comments that made it through the basic spam filter. Currently 90% of comments are rejected using a heuristic, and even more blocked by their IP address, so these are probably from real people who persisted rather than bots. They’re also sorted in order of hits and then alphabetically. In other words, they are ranked from worst to best, and therefore zol.com has least, or equal-least, multiple uses.

It’s interesting to note that gmail was by far the most popular choice (40%) when asked to provide a valid email address but when this was used to register this dropped to 7%, with Hotmail being the favourite followed by other freemail services popular in East Europe and Russia (many single-use and counted under “Other”). Does this mean that Gmail users get more hassle from Google when they misbehave? The use of outlook.com had an even bigger reduction in percentage terms – again suggesting it’s a favourite with abusers.

Another one worth noting is that mailnesia.com was clearly popular as a real address for registering spammers, but was not used even once as a fake address. This is another of those disposable email address web sites, Panamanian registered – probably worth blacklisting. emailgratis.info is also Panamania registered but heads to anonymous servers that appear to be in North Carolina.

Where you see <1% it means literally that, but it’s not insignificant. It could still mean hundreds of hits, as this is a sample of well over 20K attempts.

If you have WordPress blog and wish to extract the data, here’s how. This assumes that the MySQL database your using is called myblog, which of course it isn’t. The first file we’ll create is that belonging to registered users. It will consist of lines in the form email address <tab> hit count:

echo 'select user_email from wp_users ;' | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t" $1}' > registered-emails.txt

I have about a dozen registered users, and thousands of spammers, so there’s no real need to exclude the genuine ones for the statistics, but if it worries you, this will get a list of registered users who have posted valid comments:

select distinct user_email from wp_users join wp_comments where not comment_approved='spam' and ID=user_id;

To get a file of the email addresses of all those people who’ve posted a comment you’ve marked as spam, the following command is what you need:

echo "select comment_author_email from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-emails.txt

If you want a list of IP addresses instead, try:

echo "select comment_author_IP from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-ip-addresses.txt

As I firewall out the worse offenders there’s no point in me publishing the results.

If you find out any interesting stats, do leave a comment.

PHP PDO driver missing on FreeBSD

I got an email today – a FreeBSD 8.2 Web server installation with Apache and PHP 5.3 was missing its mySQL driver. No it wasn’t, I protested. But this error was appearing:

[error] [exception.CDbException] could not find driver
[error] [exception.CDbException] exception 'CDbException' with message 'CDbConnection failed to open the DB

“It’s not the database, it’s the PDO module”, came the explanation. Actually the pdo.so module was there and loaded, and it was installed in just the same way as it had been on earlier versions. So what was going on?

It turns out that as of PHP 5.3 there has been a change. You don’t get the important module – pdo_mysql – loaded when you compile /usr/ports/lang/php5-extensions. It’s not that it’s no longer checked by default – it’s not even there! I’m told this applies to Postgress too.

Further investigation revealed that PHP5.3 has a new native-mode driver (mysqlnd), and this is now outside the standard install. If you’re using sqlite for Persistant Data Objects, no problem, but if you’re using a more restrictive hosting account and therefore need to store them in a mySQL (or Postgress) database you seem to be out of luck.

For more information, see here.

However, the required module can still be found. Go to

/usr/ports/databases/php5-pdo_mysql

(If you’re using Postgress go to php5-pdo_pgsql instead – see Geoffrey McRae’s comment below).

make and make install what you find there. Also, make sure you compiled the original php5-extensions with mysql and mysqli, and don’t forget to restart Apache afterwards!

Note that it’s quite possible that none of the above relates to Linux. It’s to do with what gets installed from where with the FreeBSD ports tree.

WordPress ends up with wrong upload directory after moving servers

If you’ve moved WordPress from one server (or home directory) to another, everything looks fine until you come to upload a file. It then complains that it can’t write to the upload directory. Something like:


Unable to create directory /usr/home/username/wp-content/uploads/xxx/yy. Is its parent directory writable by the server?

A search through the PHP config files doesn’t reveal a path to the upload directory, and if it’s possible to change it from the Dashboard I’ve yet to find where.

The only remaining option was the mySQL database, and fishing around in the likely sounding wp_option table I found a option_name of “upload_path”. A quick query showed this was the home directory.

To save you fishing, here’s the SQL needed to set things right. Obviously select the appropriate database with “use” first!


update wp_options
set option_value='<path-to-your-wordpress-installation>/wp-content/uploads'
where wp_options.option_name='upload_path';

This is how you do it using straight SQL from the command line. If you’re using some sort of restricted web hostinig account “See your system administrator” and point them here.

Sage 2011? Line 50 with a proper database

Today I ran in to my “old friends” Sage at a computer show; they didn’t recognise me and tried to interest me in Sage Accounting for my business. I was wearing a suit, I suppose. As you can imagine, it didn’t take long for them to catch on, after which I turned the subject to the subject of Line 50 using a proper database.

You might have got the impression I really don’t like Sage. That’s not strictly true; the issue is that I really don’t like the idea of Line 50 being sold to SMEs planning to use it for anything non-trivial. Interestingly, the people from Sage agree – at least in private. The database driven version to cure the problemhas been promised for four years, so they’re obviously aware of the issue!

So when’s it coming? Apparently in Sage 2011, due out in August 2010. “Really?” I said. “Yes, definitely. At least that’s the plan”, they said.

I pushed a bit further – would it be using mySQL as promised or would they wimp-out and use the lightweight Microsoft server. I didn’t get an answer, which confirms my fears, but even a Microsoft SQL server is better that the current arrangement.

I tried to discuss the performance issues for people upgrading to Line 50 Version 2010 with them, but I got the impression they were a bit jaded on the subject, and did a very poor job of feigning surprise.

Why is Sage Line 50 so slow?

NB. If you want to know how to make Sage run faster click here for later posts, and read the comments below (there are a lot!).

As regular readers will know, I don’t think much of Sage accounting software, especially Sage Line 50. It’s fatally flawed because it stores its data in disk files, shared across a network using a file server. I suspect these.DTA files are pretty much unchanged since Graham Wylie’s original effort running under CP/M on an Amstrad PCW. There is continual talk that the newer versions will use a proper database, indeed in 2006 they announced a deal to work with mySQL. But the world has been been waiting for the upgrade ever since. It’s always coming in “next year’s” release but “next year” never comes.The latest (as of December 2009) is that they’re ‘testing’ a database version with some customers and it might come out in version seventeen.

In fact it’s in Sage’s interests to keep Line 50 running slower than a slug in treacle. Line 50 is the cheap end of the range – if it ran at a decent speed over a network, multi-user, people wouldn’t buy the expensive Line 200 (aka MMS). The snag is that Line 50 is sold to small companies that do need more than one or two concurrent users and do have a significant number of transactions a day.

So why is Line 50 so slow? The problem with Sage’s strategy of storing data in shared files is that when you have multiple users the files are opened/locked/read/written by multiple users across a network at the same time. It stands to reason. On a non-trivial set of books this will involve a good number of files, some of them very large. Networks are comparatively slow compared to local disks, and certainly not reliable, so you’re bound to end up with locked file conflicts and would be lucky if data wasn’t corrupted from time to time. As the file gets bigger and the number of users grows, the problem gets worse exponentially. The standard Sage solution seems to be to tell people their hardware in inadequate whenever timeouts occur. In a gross abuse of their consultancy position, some independent Sage vendors have been known to sell hapless lusers new high-powered servers, which does make the problem appear to go away. Until, of course, the file gets a bit bigger. Anyone who knows anything about networking will realise this straight away that this is a hopeless situation, but not those selling Sage – at least in public.

One Sage Solution Provider, realising that this system was always going to time-out in such circumstances, persuaded the MD of the company using it to generate all reports by sitting at the server console. To keep up the pretence this was a multi-user system, he even persuaded them to install it on a Windows Terminal Server machine so more than one person could use it by means of a remote session.

If that weren’t bad enough, apparently it didn’t even work when sitting at the console, and they’ve advised the customers to get a faster router. I’m not kidding – this really did happen.

The fact is that Sage Line 50 does not run well over a network due to a fundamental design flaw. It’s fine if it’s basically single-user on one machine, and I have clients using it this way. If you want to run multiple users, especially if your books are non-trivial, you need Sage 200/MMS – or a different accounting package altogether.