Fighting WikiSpam: Eaton and Shared Blacklists

WikiSym 2005 was awesome. Massive props to Dirk Riehle and the program committee for throwing an outstanding event and drawing tons of great, great people. With Wikimania last August and WikiSym this past week, the Wiki community is really starting to gel. And it’s about time. Can you believe Wikis are 10 years old?    (JXD)

Now the bad news: I walked away with some action items. How do I get myself into these messes?!    (JXE)

The first action item can be traced back to an ad hoc meeting that happened at Wikimania regarding WikiSpam. On August 6, a group of Wiki developers — me (PurpleWiki), Alex Schroeder (OddMuse), Brion Vibber (Mediawiki), Thomas Waldmann (MoinMoin), Sven Dowideit (TWiki), Janne Jalkanen (JSPWiki) — along with John Breslin and Jochen Topf, got together to discuss ways we could collaborate on fighting WikiSpam. Our goal was to identify the simplest possible first step and not to get mired in process discussions.    (JXF)

Since all of us were already maintaining URL blacklists, we decided to merge them and host it as a Sourceforge project. We agreed on a standard format (which I’ll document and post soon), and we agreed to send our respective lists to Alex, who already has scripts to slice, dice, and merge.    (JXG)

One of my action items then was to create the Sourceforge project. I did that immediately, but for some reason, the project was rejected. Thus began a month-long go-around with Sourceforge support where I tried to discover why they had rejected the proposal. In the end, the project was approved, and I never got an answer as to why it was rejected in the first place. At that point, I was mired in other work, and so I never followed up.    (JXH)

WikiSym was the kick in the butt I needed to follow-up. On Sunday, Sunir Shah hosted an antispam workshop, which about 40 people attended. First, Sunir reviewed techniques (many of which are listed at MeatBall:WikiSpam). Then we broke out.    (JXI)

In my breakout, I described what we had agreed on at Wikimania. Then Peter Kaminski described a very cute idea he had for making it easy to fight WikiSpam. In a nutshell, Peter suggested we write a simple drop-in replacement CGI wrapper that would filter a POST payload for spam and call the real CGI script — be it a Wiki, a blog, or anything else — if the payload were spam-free. Such a wrapper would enable users to install spam-protection for any CGI script without having to write a single line of code and without having to do any complex configuration. It wouldn’t require any special access to your web server, since it would just be a CGI script. And you could easily add other spam-fighting measures, such as throttling and IP blacklists.    (JXJ)

I thought it was a brilliant idea. So Peter and I sat down afterwards and whipped it up. Took about an hour. It’s called Eaton, it works, and it’s Public Domain. Peter Kaminski has already blogged about it, and there’s some important commentary there from Jay Allen, the creator of MT-Blacklist.    (JXK)

It’s a proof of concept, and it won’t scale. It can and should be improved, and I’d encourage folks to do so. Nevertheless, it’s pretty cool. Bravo to Peter for a very clever idea.    (JXL)

By the way, the first person to figure out the origins of the name “Eaton” wins a cookie.    (JXM)

WikiMania Hackfest Day 4

Bits and tids:    (JM7)

  • I didn’t plan my Hacking Days schedule very well. I missed most of the first day, when the Mediawiki developers apparently made progress on a new metadata design. Days 2 and 3, from which I based most of my criticism, focused on servers and reliability, an area to which I really couldn’t contribute, not because I’m ignorant, but because I’m powerless. This morning, they discussed Single Sign-On and usability, two areas that I do know something about. Sadly, I missed these sessions, because I was too busy spouting on and on about how we really can save the world. Owen Davis, Fen Labalme, Kaliya Hamlin, and the rest of the gang will undoubtedly kick my butt when they read this. In my defense, I managed to talk a bit about Identity Commons later in the day. I also plugged the FLOSS Usability Sprint, and met Zeno Gantner, who’s done some usability studies on Mediawiki.    (JM8)
  • I was one of the featured participants for the afternoon “Wiki developers informal discussion,” along with Ward Cunningham, Sven Dowideit, Christophe Ducamp, and Brion Vibber. Domas Mituzas, Wikimedia Foundation‘s head of operations, asked Ward, “Why Camel Case?” I won’t go into the explanation here — I have a long interview with Ward, to be published eventually, that explains this in detail — but you should know that hating Camel Case is a running joke among this community. I laughed along with everyone else, but when Sven mentioned his desire to remove Camel Case from TWiki, I felt compelled to pipe up. I gave a balanced defense, describing Camel Case’s advantages over free links, but also acknowledging the appropriateness of free links in Wikipedia. Then I got a very amusing introduction to Erik Moeller, one of Mediawiki‘s core contributors and the Wikimedia Foundation‘s chief research officer. Erik had a strongly worded response. It got a bit heated, but never overly so, and I closed by saying that we were in violent agreement. We laughed about it over dinner, but then we got serious again. We also talked about Purple Numbers. I’ve explained many times why I may seem like a poor evangelist, but I think Erik was one of the few people who appreciated my perspective. He was clearly not a big fan of Purple Numbers — as it turns out, he was somewhat familiar with my work — but after hearing my explanation, he responded, “Only intelligent people are going to understand what you just said.” Fair enough. Fortunately, regular folks don’t need to get Granular Addressability for Granular Addressability to become ubiquitous.    (JM9)
  • A group of us broke out into a small group to discuss a Wiki Interchange Format, knowing full well that this is an issue that’s been discussed many times before (Wiki:WikiInterchangeFormat, MeatBall:WikiInterchangeFormat). Nevertheless, I think our discussion was not only constructive, it has a high chance of succeeding. See my summary.    (JMA)
  • Magnus Manske, the original creator of Mediawiki, participated in our Wiki Interchange Format discussion. He also mentioned a clever idea: a “shopping cart” where people could aggregate and possibly export Wiki pages they were interested in.    (JMB)
  • Sven Dowideit demonstrated the prototype WYSIWYG editor for TWiki, based on Kupu. He also showed a WikiText editor with real-time preview, which was pretty slick. Also, Ross Mayfield showed me a prototype editor for KWiki in response to my previous post. Very good to see these things.    (JMC)
  • So many people have come to this gathering to learn from others with different experiences. Granted, all of these experiences center around Wikipedia, but I’m still envious. My neverending quest is for folks interested in collaboration to look beyond their own narrow domains for deeper insights.    (JMD)