On studying the behaviour of comment spammers I became interested in the email addresses they used. Were they genuine and where were they from? Well of course they’re not likely to be genuine, but it is possible to force them to register with an address if they want their comments to appear – even if they don’t. Here’s what I found:
When the spammers were required to register, these are the domain names they registered with:
Domain |
Percent |
hotmail.com |
25% |
mailnesia.com |
19% |
Others (unique) |
16% |
gmail.com |
7% |
o2.pl |
7% |
outlook.com |
5% |
emailgratis.info |
4% |
gmx.com |
2% |
poczta.pl |
2% |
yahoo.com |
2% |
more-infos-about.com |
1% |
aol.com |
1% |
go2.pl |
1% |
katomcoupon.com |
1% |
tlen.pl |
1% |
acity.pl |
1% |
dispostable.com |
1% |
live.com |
1% |
mail.ru |
1% |
se.vot.pl |
1% |
acoustirack.com |
<1% |
butala.htsail.pl |
<1% |
cibags.com |
<1% |
eiss.xoxi.pl |
<1% |
justmailservice.info |
<1% |
laposte.net |
<1% |
pimpmystic.com |
<1% |
twojewlasnem.pl |
<1% |
wp.pl |
<1% |
Where the authenticity of the address is more questionable, although the sample a lot larger, the figures are as follows:
Domain |
Percent |
gmail.com |
40% |
yahoo.com |
11% |
Other (unique) |
6% |
hotmail.com |
6% |
aol.com |
4% |
ymail.com |
2% |
googlemail.com |
2% |
gawab.com |
2% |
bigstring.com |
1% |
zoho.com |
1% |
t-online.de |
1% |
inbox.com |
1% |
web.de |
1% |
yahoo.de |
1% |
arcor.de |
1% |
live.com |
1% |
freenet.de |
1% |
yahoo.co.uk |
1% |
comcast.net |
1% |
mail.com |
1% |
gmx.net |
1% |
gmx.de |
1% |
outlook.com |
<1% |
live.cn |
<1% |
hotmail.de |
<1% |
msn.com |
<1% |
livecam.edu |
<1% |
google.com |
<1% |
live.de |
<1% |
rocketmail.com |
<1% |
gmail.ocm |
<1% |
wildmail.com |
<1% |
moose-mail.com |
<1% |
hotmail.co.uk |
<1% |
care2.com |
<1% |
certify4sure.com |
<1% |
snail-mail.net |
<1% |
1701host.com |
<1% |
cwcom.net |
<1% |
maill1.com |
<1% |
wtchorn.com |
<1% |
chinaadv.com |
<1% |
noramedya.com |
<1% |
o2.pl |
<1% |
vegemail.com |
<1% |
vp.pl |
<1% |
24hrsofsales.com |
<1% |
kitapsec.com |
<1% |
peacemail.com |
<1% |
whale-mail.com |
<1% |
wp.pl |
<1% |
aim.com |
<1% |
animail.net |
<1% |
bellsouth.net |
<1% |
blogs.com |
<1% |
email.it |
<1% |
mailcatch.com |
<1% |
rady24.waw.pl |
<1% |
titmail.com |
<1% |
fastemail.us |
<1% |
btinternet.com |
<1% |
harvard.edu |
<1% |
onet.pl |
<1% |
yahoo (various international) |
<1% |
akogoto.org |
<1% |
concorde.edu |
<1% |
freenet.com |
<1% |
leczycanie.pl |
<1% |
mail15.com |
<1% |
speakeasy.net |
<1% |
yale.edu |
<1% |
123inholland.co.nl |
<1% |
SolicitorsWorld.com |
<1% |
apemail.com |
<1% |
buysellonline.in |
<1% |
email.com |
<1% |
help.com |
<1% |
ipad2me.com |
<1% |
ismailaga.org.tr |
<1% |
live.fr |
<1% |
myfastmail.com |
<1% |
mymail.com |
<1% |
ngn.si |
<1% |
redpaintclub.co.uk |
<1% |
stonewall42.plus.com |
<1% |
traffic.seo |
<1% |
xt.net.pl |
<1% |
a0h.net |
<1% |
accountant.com |
<1% |
alphanewsroom.com |
<1% |
att.net |
<1% |
auctioneer.com |
<1% |
brandupl.com |
<1% |
canplay.info |
<1% |
charter.net |
<1% |
cluemail.com |
<1% |
darkcloudpromotion.com |
<1% |
earthlink.com |
<1% |
earthlink.net |
<1% |
eeemail.pl |
<1% |
emailuser.net |
<1% |
excite.com |
<1% |
fastmail.net |
<1% |
gmai.com |
<1% |
gouv.fr |
<1% |
h-mail.us |
<1% |
hotmail.ca |
<1% |
hotmailse.com |
<1% |
hotmalez.com |
<1% |
imajl.pl |
<1% |
jmail.com |
<1% |
juno.com |
<1% |
live.co.uk |
<1% |
mac.com |
<1% |
mailandftp.com |
<1% |
mailas.com |
<1% |
mailbolt.com |
<1% |
mailnew.com |
<1% |
mailservice.ms |
<1% |
modeperfect3.fr |
<1% |
mymacmail.com |
<1% |
nyc.gov |
<1% |
op.pl |
<1% |
peoplepc.com |
<1% |
petml.com |
<1% |
pornsex.com |
<1% |
qwest.net |
<1% |
rosefroze.com |
<1% |
sbcglobal.net |
<1% |
ssl-mail.com |
<1% |
t-online.com |
<1% |
thetrueonestop.com |
<1% |
turk.net |
<1% |
virgilio.it |
<1% |
virginmedia.com |
<1% |
windstream.net |
<1% |
yaahoo.co.uk |
<1% |
yahoo.com.my |
<1% |
yazobo.com |
<1% |
yopmail.com |
<1% |
zol.com |
<1% |
A few words of warning here. First, these figures are taken from comments that made it through the basic spam filter. Currently 90% of comments are rejected using a heuristic, and even more blocked by their IP address, so these are probably from real people who persisted rather than bots. They’re also sorted in order of hits and then alphabetically. In other words, they are ranked from worst to best, and therefore zol.com has least, or equal-least, multiple uses.
It’s interesting to note that gmail was by far the most popular choice (40%) when asked to provide a valid email address but when this was used to register this dropped to 7%, with Hotmail being the favourite followed by other freemail services popular in East Europe and Russia (many single-use and counted under “Other”). Does this mean that Gmail users get more hassle from Google when they misbehave? The use of outlook.com had an even bigger reduction in percentage terms – again suggesting it’s a favourite with abusers.
Another one worth noting is that mailnesia.com was clearly popular as a real address for registering spammers, but was not used even once as a fake address. This is another of those disposable email address web sites, Panamanian registered – probably worth blacklisting. emailgratis.info is also Panamania registered but heads to anonymous servers that appear to be in North Carolina.
Where you see <1% it means literally that, but it’s not insignificant. It could still mean hundreds of hits, as this is a sample of well over 20K attempts.
If you have WordPress blog and wish to extract the data, here’s how. This assumes that the MySQL database your using is called myblog, which of course it isn’t. The first file we’ll create is that belonging to registered users. It will consist of lines in the form email address <tab> hit count:
Please generate and paste your ad code here. If left empty, the ad location will be highlighted on your blog pages with a reminder to enter your code.
Mid-Post
echo 'select user_email from wp_users ;' | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t" $1}' > registered-emails.txt
I have about a dozen registered users, and thousands of spammers, so there’s no real need to exclude the genuine ones for the statistics, but if it worries you, this will get a list of registered users who have posted valid comments:
select distinct user_email from wp_users join wp_comments where not comment_approved='spam' and ID=user_id;
To get a file of the email addresses of all those people who’ve posted a comment you’ve marked as spam, the following command is what you need:
echo "select comment_author_email from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | tr @ ' ' | awk '{ print $2 }' | sed '/^$/d' | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-emails.txt
If you want a list of IP addresses instead, try:
echo "select comment_author_IP from wp_comments where comment_approved='spam';" | mysql myblog | sed 1d | sort | uniq -c | sort -n | awk '{ print $2 "\t " $1}' > spammer-ip-addresses.txt
As I firewall out the worse offenders there’s no point in me publishing the results.
If you find out any interesting stats, do leave a comment.