Akismet, centralised spam combating solution by Matt

Akismet's Homepage Matt Mullenweg announced a new spam combating service -- Akismet. If you are a personal blogger, or pro-blogger-wannabe who cannot make more than $500 a month, Akismet is free for you to use. If it is a commercial site, or you are making big bucks from your blogsites -- then a commercial license needs to be acquired starting from $5/month.

So, how does Akismet catch spams? How does it reduce false positives? What sort of algorithm does it use? Well, hmmm. We don't know. Akismet is a centralised spam classifying service. For every comment received by your blog, it gets delivered to a centralised server, using a REST-based API. If the big brain on that server doesn't like, it yells back "Spam!!!" and so that comment will be marked.

So, how does this centralised server determine whether a comment is a ham or a spam? According to the FAQ,

When a new comment, trackback, or pingback comes to your blog it is submitted to the Akismet web service which runs hundreds of tests on the comment and returns a thumbs up or thumbs down.

Hmm. Probably something like SpamAssassin but for blog comments. According to Michael Hampton, it "entirely replace plugins such as wp-hashcash, Spam Karma 2, AuthImage, etc" so I guess they must have sampled some of those implementations. Further on, he mentioned that he has "integrating CJD's Spam Nuker". So we probably get some idea what kind of backend does it have.

It also allows the users to manually classify comments as spam or ham. In the sense it might have some kind of Bayesian classifier that can be trained. Useful to report all the false positive. and false negatives.

So, what's good about Akismet?

  • A large sample of comment spams allows its Bayesian classifier to be thoroughly trained.
  • Centralised service so Matt and co can do all the fine tuning without touching your site. No more updates for algorithm changes.
  • Nice API that can be easily integrated into other blog tools. There might even command-line tools that can submit spam/ham in bulk.

But why I probably would not use it?

  • A centralised server. I hate latency, especially my blogs are hosted somewhere half way around the world to Akismet's central server.
  • A centralised service. Just imagine millions of WordPress blogs download this plugin and deploy it today, and send millions of comments to this potentially CPU intensive classifying job...
  • A centralised user-trained classification service. Although FAQ said that it is unlikely to poison the classifier (probably some kind of jail on a per-API key level), I just don't feel right when someone anonymous blogger is moderating my comments.
  • I don't earn USD$500 a month blogging, but I hope one day I will. (Currently projecting when un*x time(2) wraps around...)
  • But most importantly, I don't get spams. Well. Rarely -- to a point that it has never bothered me, when I require all first-time commenters to be moderated, which should be a default option for WordPress.

Still, I applaud for this great product. Not perfect, but probably still the next best thing than that red button labelled "Kill All Spammers".

Update: Since I moved the site to DreamHost, I have actually started to use Akismet, and was surprised by the result -- it is quite good. Centralised issue still concerns me. Things like Ping-o-matic outage can really stall blog posting, but fortunately Akismet plugin has good built-in timeout so that it will give up if the server is not responsive.