More comment spammer email analysis

Since my earlier post, I decided to see what change there had been in the email addresses used by comment spammers to register. Here are the results:


Freemail Service  % 22% 20% 14% 8% 6% 6% 3% 2% 2% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1%

As before, domains with <1% are still significant; it’s a huge sample. I’ve only excluded domains with <10 actual attempts.

The differences from 18 months ago are interesting. Firstly, has dropped from 19% to 6% – however this is because the spam system has decided to block it! Hotmail is also slightly less and Gmail and AOL are about the same. The big riser is Yahoo, followed by (which had the highest percentage rise of them all). O2 in Poland is still strangely popular.

If you want to know how to extract the statistics for yourself, see my earlier post.

Email addresses used by comment spammers on WordPress

On studying the behaviour of comment spammers I became interested in the email addresses they used. Were they genuine and where were they from? Well of course they’re not likely to be genuine, but it is possible to force them to register with an address if they want their comments to appear – even if they don’t. Here’s what I found:

When the spammers were required to register, these are the domain names they registered with:

Domain Percent 25% 19%
Others (unique) 16% 7% 7% 5% 4% 2% 2% 2% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% <1% <1% <1% <1% <1% <1% <1% <1% <1%

Where the authenticity of the address is more questionable, although the sample a lot larger, the figures are as follows:

Domain Percent 40% 11%
Other (unique) 6% 6% 4% 2% 2% 2% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% <1% <1% <1% <1% <1% <1% <1% <1%
gmail.ocm <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1%
yahoo (various international) <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1%
traffic.seo <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1%

A few words of warning here. First, these figures are taken from comments that made it through the basic spam filter. Currently 90% of comments are rejected using a heuristic, and even more blocked by their IP address, so these are probably from real people who persisted rather than bots. They’re also sorted in order of hits and then alphabetically. In other words, they are ranked from worst to best, and therefore has least, or equal-least, multiple uses.

It’s interesting to note that gmail was by far the most popular choice (40%) when asked to provide a valid email address but when this was used to register this dropped to 7%, with Hotmail being the favourite followed by other freemail services popular in East Europe and Russia (many single-use and counted under “Other”). Does this mean that Gmail users get more hassle from Google when they misbehave? The use of had an even bigger reduction in percentage terms – again suggesting it’s a favourite with abusers.

Another one worth noting is that was clearly popular as a real address for registering spammers, but was not used even once as a fake address. This is another of those disposable email address web sites, Panamanian registered – probably worth blacklisting. is also Panamania registered but heads to anonymous servers that appear to be in North Carolina.

Where you see <1% it means literally that, but it’s not insignificant. It could still mean hundreds of hits, as this is a sample of well over 20K attempts.

If you have WordPress blog and wish to extract the data, here’s how. This assumes that the MySQL database your using is called myblog, which of course it isn’t. The first file we’ll create is that belonging to registered users. It will consist of lines in the form email address <tab> hit count:

Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code. Mid-Post

echo 'select user_email from wp_users ;' | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t" $1}' > registered-emails.txt

I have about a dozen registered users, and thousands of spammers, so there’s no real need to exclude the genuine ones for the statistics, but if it worries you, this will get a list of registered users who have posted valid comments:

select distinct user_email from wp_users join wp_comments where not comment_approved='spam' and ID=user_id;

To get a file of the email addresses of all those people who’ve posted a comment you’ve marked as spam, the following command is what you need:

echo "select comment_author_email from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-emails.txt

If you want a list of IP addresses instead, try:

echo "select comment_author_IP from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-ip-addresses.txt

As I firewall out the worse offenders there’s no point in me publishing the results.

If you find out any interesting stats, do leave a comment.

What is all this Zune comment spam about?

People running popular blogs are often targeted by comment spammers – this blog gets hit with at least 10,000 a year (and very useful for botnet research) – most of it is semi-literate drivel containing a link to some site being “promoted”. Idiots pay other idiots to do this because they believe it will increase their Google ranking. It doesn’t, but a fool and his money are soon parted and the comment spammers, although wasting everyone’s time, are at least receiving payment from the idiots of the second part.

But there’s a weird class of comment spam that’s been going for years which contains lucid, but repeated, “reviews” about something called a “Zune”. It turns out that this is a Microsoft MP3 player available in the USA. The spams contain a load of links, and I assume that the spammers are using proper English (well, American English) in an attempt to get around automated spam filters that can spot the broken language of the third-world spam gangs easily enough. But they do seem to concentrate on the Zune media player rather than other topics. Blocking them is easy: just block any comment with the word “Zune” in, as it doesn’t appear in normal English. Unless, of course, your blog is about media players available in the USA.

This really does beg the question: why are these spammers sicking to one subject with a readily identified filter signature? I’ve often wondered if they’re being paid by a Microsoft rival to ensure that the word “Zune” appears in every spam filter on the planet, thus ensuring that no “social media” exposure exists for the product. Or is this just a paranoid conspiracy theory?

An analysis of the sources shows that nearly all of this stuff is coming from dubious server hosting companies.  A dubious hosting company is one that doesn’t know/care what its customers are doing, as evidenced by continued abuse and lack of response to complaints. There’s one in Melbourne (Telstra!) responsible for quite a bit of it, and very many in South Korea plus a smattering in Europe, all of which are “one-time” so presumably they’re taking complains seriously even if they’re not vetting beforehand. It’s hard to be sure about the Koreans – there are a lot but there’s evidence they might be skipping from one hosting company to the other. Unusually for this kind of abuse there are very few in China and Eastern Europe, and only the odd DSL source. These people don’t seem to be making much use of botnets.

So, one wonders, what’s their game? Could it be they’re buying hosting space and appearing to behave themselves by posting reasonable-looking but irrelevant comments? Well any competent server operators could detect comment posting easily enough, but in the “cheap” end of the market they won’t have the time or even the minimal knowledge to do this.

I did wonder if they were using VPN endpoints for this, but as there’s no reverse-lookup in the vast majority of cases it’s unlikely to be any legitimate server.

Comment spam from Volumedrive

Comment spammers aren’t the sharpest knives in the draw. If they did their research properly they’d realise that spamming here was a stupid as trying to burgle the police station (while it’s open). You’ll notice there’s no comment spam around here, but that isn’t to say they don’t try.

Anyway, there’s been a lot of activity lately from a spambot running at an “interesting” hosting company called Volumedrive. They rent out rack space, so it’s not going to be easy for them to know what their customers are doing, but they don’t seem inclined to shut any of them down for “unacceptable” use. For all I know they’ve got a lot of legitimate customers, but people do seem to like running comment spammers through their servers.

If you need to get rid of them, there is an easy way to block them completely if you’re running WordPress, even if you don’t have full access to the server and its firewall. The trick is to over-ride the clients Apache is prepared to talk to (default: the whole world) by putting a “Deny from” directive in the .htaccess file. WordPress normally creates a .htaccess file in its root directory; all you do is add:

Deny from

Here, “” is the server sending you the spam, but in reality they probably haven’t called themselves anything so convenient. The Apache documentation isn’t that explicit unless you read the whole lot, so it’s worth knowing you can actually list IP addresses (more than one per line) and even ranges of IP addresses (subnets).

For example:

Deny from
Deny from
Deny from

The last line blocks everything from to If you don’t know why, please read up on IP addresses and subnet masks (or ask below in a comment).

So when you get a a load of spammers from similar IP addresses, look up to see who the block belongs to using “whois”. Once you know you can block the whole lot. For example, if you’re being hit by the bot using Volumedrive on, run “whois”. This will return:

NetRange: -
OriginAS: AS46664
NetHandle: NET-173-242-112-0-1
Parent: NET-173-0-0-0-0
NetType: Direct Allocation


If you don’t have whois on your comptuer (i.e. you’re using Windoze) there’s a web version at

In the above, the CIDR is the most interesting – it specifies the block of IP addresses routed to one organisation. I’m not going in to IP routing here and now, suffice to say that in this example it specifies the complete block of addresses belonging to volumedrive that we don’t want – at least until they clean up their act.

To avoid volumedrive’s spambots you need to add the following line to the end your .htaccess file:

Deny from

If this doesn’t work for you the the web server you’re using may have been configured in a strange way – talk to your ISP if they’re the approachable type.

I have contacted Volumedrive, but they declined to comment, or even reply; never mind curtail the activities of their users.

This isn’t a WordPress-only solution – .htaccess belongs to Apache and you can use it to block access to any web site.

Perhaps there’s some scope in sharing a list these comment spambots in an easy-to-use list. If anyone’s interested, email me. This is a Turing test :-)