Fighting WikiSpam: Eaton and Shared Blacklists

WikiSym 2005 was awesome. Massive props to Dirk Riehle and the program committee for throwing an outstanding event and drawing tons of great, great people. With Wikimania last August and WikiSym this past week, the Wiki community is really starting to gel. And it’s about time. Can you believe Wikis are 10 years old?    (JXD)

Now the bad news: I walked away with some action items. How do I get myself into these messes?!    (JXE)

The first action item can be traced back to an ad hoc meeting that happened at Wikimania regarding WikiSpam. On August 6, a group of Wiki developers — me (PurpleWiki), Alex Schroeder (OddMuse), Brion Vibber (Mediawiki), Thomas Waldmann (MoinMoin), Sven Dowideit (TWiki), Janne Jalkanen (JSPWiki) — along with John Breslin and Jochen Topf, got together to discuss ways we could collaborate on fighting WikiSpam. Our goal was to identify the simplest possible first step and not to get mired in process discussions.    (JXF)

Since all of us were already maintaining URL blacklists, we decided to merge them and host it as a Sourceforge project. We agreed on a standard format (which I’ll document and post soon), and we agreed to send our respective lists to Alex, who already has scripts to slice, dice, and merge.    (JXG)

One of my action items then was to create the Sourceforge project. I did that immediately, but for some reason, the project was rejected. Thus began a month-long go-around with Sourceforge support where I tried to discover why they had rejected the proposal. In the end, the project was approved, and I never got an answer as to why it was rejected in the first place. At that point, I was mired in other work, and so I never followed up.    (JXH)

WikiSym was the kick in the butt I needed to follow-up. On Sunday, Sunir Shah hosted an antispam workshop, which about 40 people attended. First, Sunir reviewed techniques (many of which are listed at MeatBall:WikiSpam). Then we broke out.    (JXI)

In my breakout, I described what we had agreed on at Wikimania. Then Peter Kaminski described a very cute idea he had for making it easy to fight WikiSpam. In a nutshell, Peter suggested we write a simple drop-in replacement CGI wrapper that would filter a POST payload for spam and call the real CGI script — be it a Wiki, a blog, or anything else — if the payload were spam-free. Such a wrapper would enable users to install spam-protection for any CGI script without having to write a single line of code and without having to do any complex configuration. It wouldn’t require any special access to your web server, since it would just be a CGI script. And you could easily add other spam-fighting measures, such as throttling and IP blacklists.    (JXJ)

I thought it was a brilliant idea. So Peter and I sat down afterwards and whipped it up. Took about an hour. It’s called Eaton, it works, and it’s Public Domain. Peter Kaminski has already blogged about it, and there’s some important commentary there from Jay Allen, the creator of MT-Blacklist.    (JXK)

It’s a proof of concept, and it won’t scale. It can and should be improved, and I’d encourage folks to do so. Nevertheless, it’s pretty cool. Bravo to Peter for a very clever idea.    (JXL)

By the way, the first person to figure out the origins of the name “Eaton” wins a cookie.    (JXM)

Purple Numbers Are Ugly

Evan Henshaw-Plath thinks that Purple Numbers are ugly. He’s not the first.    (JW1)

A bit of history. The original Purple Numbers were a dark purple. Then Murray Altheim came up with a brilliant idea. Let’s make them lighter! So he did. That was better, but it wasn’t enough.    (JW2)

As Chris Dent and I started taking the identifier scheme to the next level, blogs started becoming popular. Permalinks, as rabble and others have pointed out, are granular addresses. Because our new identifier scheme was uglier by design (universally unique IDs can get quite large), we decided to jump on the blogging bandwagon and use hashes instead.    (JW3)

Then a funny thing happened. Peter Yim, a long-time Engelbart follower and an avid PurpleWiki user, complained. A lot. He said it was too hard to figure out what the links were. And as much as I tried to ignore him, I couldn’t. He was right. Or, more accurately, we were wrong. So we changed it back.    (JW4)

Our decision to go with hashes was wrong specifically in the context of PurpleWiki. Wikis are wonderful because you can Link As You Think. This is possible because page names are automatically linked, and it’s easy to remember page names. We added syntax for easily linking to Purple Numbers (for example, Purple Numbers). The problem was that when we replaced the addresses with hashes, we made it harder to Link As You Think.    (JW5)

With WikiSym coming up in a few days, I’ve had Wikis on my mind big-time, and I recently had an epiphany. It turns out I was wrong about being wrong.    (JW6)

The identifiers underlying Purple Numbers are designed to be stable, unique, and meaningless. In other words, not human-friendly. The notion of linking to Purple Numbers via our extended Wiki syntax isn’t tremendously useful, even if the identifiers are easily visible, because they’re impossible to remember. You’re rarely going to Link As You Think, because you’ll most likely have to go to the page to check what the number is. If you’re going to do that, you might as well just cut-and-paste the link, the status quo of the web.    (JW7)

Phil Jones almost stumbled onto something quite profound in his commentary last May, but he couldn’t quite put his finger on it, and Chris and I consequently jumped all over him. We were right, of course, but Phil was onto something.    (JW8)

In a Wiki context, here’s the right way to use Purple Numbers. By default, Purple Numbers should look pretty. So far, the best scheme I’ve seen for this is Simon Willison‘s nifty CSS hack. If folks want to link to a Purple Number in a Wiki, they can do it the way you do it anywhere else — get the link address by clicking on the number (or hash or paragraph symbol or whatever), and copy-and-paste it into your document.    (JW9)

However, the Wiki should add an additional feature: the ability to add a human-friendly label to any paragraph. Some Wikis implement this capability by using special WikiText tags, but you should be able to implement this using AJAX goodness.    (JWA)

In other words, suppose you’re reading a Wiki page, and you find one paragraph particularly compelling. All you do is click on the paragraph and add your human-friendly tag. Let’s say the page is DeepThoughts and you enter the label indeed. The label should appear next to the paragraph, and anytime somebody wants to link to that paragraph, they just type DeepThoughts#indeed.    (JWB)

Here’s where it gets cute. Doing this feels like tagging. You’re just tagging granular content instead of documents… which is what Purple Numbers are designed to enable in the first place. So, make the label a tag across the entire Wiki. In other words, if you click on the label indeed, you get a search page showing all paragraphs on the Wiki that have been tagged indeed. If you really want to be cute, you can make it a Technorati Tag so it gets crawled.    (JWC)

This makes cosmic sense. Both Wikis and tagging work when labels are not unique — the exact opposite requirement of Purple Numbers. You want namespace clash, and Wikis and tagging give you that. This way, you get the best of all worlds. You still have the immutable, unique Purple Numbers, but now they’re not so ugly. You also get Link As You Think granular addresses and granular tagging. Everyone wins.    (JWD)

(By the way, rabble, looking forward to seeing pretty Purple Numbers in typo!)    (JWE)

Unit Tests: A Reliable Friend

Mike Mell and I were IMing today about a project we’re working on together, and he said off-hand that he’d like to be more rigorous about writing unit tests. His comment struck me, because I’ve been hacking PurpleWiki again after being away for literally months, and once again, unit testing has made my life considerably easier.    (JUO)

I’ve written about unit tests before, using PurpleWiki as an example. Writing them can be a serious pain, especially when you’re on a coding roll, and I’ve been known to cheat. But the more you refactor or code intermittently (as I’m prone to do), the less inclined you are to cheat in the future.    (JUP)

Unit tests are the ultimate in peace of mind. There’s just no reason not to use them.    (JUQ)

Kwiki::Purple, Wiki Deep Thoughts

My ex-partner-in-crime, Chris Dent, has been busy coding and expounding. Last month, he released Kwiki::Purple, a Purple Numbers plugin for Kwiki. You can play with it on his test site.    (IFZ)

This is fantastic news on a number of fronts. First, it’s further validation of our strategy to have the ideas take over the world, not the code. As I’ve said from the beginning, the purpose of PurpleWiki is to be a vehicle for ideas. Our goal was not for PurpleWiki to become the Wiki, but for other Wikis to steal our best ideas. There are now three Wikis with Purple NumbersKwiki, Zwiki, and PurpleWiki — with hopefully more to come.    (IG0)

Second, the fact that Chris was able to implement this as a Kwiki plugin makes Kwiki a more viable option for Blue Oxen Associates as its Wiki platform of the future. This is a good thing for many reasons.    (IG1)

Chris has also been doing some expounding on Wikis. His entry, “Why Wiki?”, is old hat for folks who know Chris, but it’s a nice, clean summary of his views for those who don’t. His framing of augmentation versus automation (discussed in much more detail in his paper, “The Computer As Tool: From Interaction to Augmentation”) is powerful, and I’ve borrowed it in my own thinking and writing. I also liked this line:    (IG2)

Architecting these sorts of tools may not solve poverty and hunger, or alleviate suffering in the aftermath of a disaster, but the tools can augment people actively doing that work. I happen to be good at making the tools go, so that’s where I look to fit myself into the puzzle.    (IG3)

Chris’s thoughts on Wikis as an external cache is another good piece. Two quick comments. First, viewing Wikis as an external cache reveals an important constraint. They are most valuable to folks who are already immersed in a conversation, because those folks already have some context that aids them in exploring the Wiki. For example, Wikis used for self-documenting events are not so good at involving those who did not participate, but they are extremely valuable for those who did. At the same time, they are better than nothing. If someone is motivated enough, they can use Wikis as a springboard for acquiring the context they need, and thus gain value that way. Think Out Loud is good.    (IG4)

Second, Chris writes:    (IG5)

I’ve found that in order for outboard processing to work there’s several design and process guidelines that have to be reached. Here are some: interaction must be highly responsive, noise in the interface must be minimized, structural mechanics and metaphors in content need to be consisent, names must have value, it must be there when you want it, when there is a shared brain its context is shared as well (e.g when some members of the company have a discussion about design it it is done in an archivable fashion).    (IG6)

(Emphasis is mine.) It’s ironic that Chris cites highly responsive interaction as a requirement for collaboration on an asynchronous medium to work. I agree that this is an important pattern of effective collaboration, but I wouldn’t go so far as to say it’s a requirement. There is an alternative mode, one that emphasizes deep thinking augmented by infrequent, but deep interactions. A big void in the collaborative space are tools that augment this mode of interaction. See my design notes on Abelard for an example of what such a tool might look like.    (IG7)

News Flash: Transclusions Already Ubiquitous!

No, this is not an advertisement for PurpleWiki (although PurpleWiki does support Transclusions). This is a wakeup call. Transclusions already exist and have existed for a long time. No, I’m not talking about Project Xanadu. I’m talking about the World Wide Web and spreadsheets, among others.    (IDZ)

First things first. What’s a Transclusion? A transclusion is a link where the content of the link is displayed inline. For example:    (IE0)

is a link. This:    (IE2)

   (IE3)

is the content of that link displayed inline. Which, of course, is example number one. Images on the Web are transclusions. When I include a URL in <img> tags, the content of that URL is displayed.    (IE4)

We use Transclusions all the time in spreadsheets. When I write =E27 in a cell, the spreadsheet displays the content of cell E27.    (IE5)

Transclusions are useful, and they’re ubiquitous, but not necessarily as “transclusions.” They’re not yet part of a shared conceptual framework for collaborative tools. Once we explicitly acknowledge their existence and their utility, we can think about implementing them across different applications in an interoperable way.    (IE6)