Wiki Standards

At WikiSym 2005, we had a BoF on Wiki Standards, organized by Stephan Schmidt, coauthor of SnipSnap. Discussion was spirited, as you might expect, but I think we accomplished a great deal.    (JY4)

We began by reviewing a list of things that could be standardized, which was old hat for a lot of folks who’ve been thinking about this stuff for a while. We quickly decided to move on, because we weren’t going to come to agreement in any short period of time. So we decided to agree on a Neutral Space where we could have the discussion.    (JY5)

That discussion was more controversial than I expected. MeatballWiki has been the primary forum for the Wiki community to talk about interoperability, and I and others didn’t see any reason for that to change. Some folks felt very strongly about WikiSym being more neutral, and so we ultimately decided to have our standardization discussions there. My guess is that a lot of innovation will continue to happen on Meatball and other places, while the drafting and discussion of standards will happen at WikiSym.    (JY6)

So here’s the deal. If you’re interested in Wiki Standards — and that should be everyone in the Wiki community — subscribe to the wiki-standards mailing list. There’s already been some excellent discussion, and I think we’re going to see some real specs soon.    (JY7)

Fighting WikiSpam: Eaton and Shared Blacklists

WikiSym 2005 was awesome. Massive props to Dirk Riehle and the program committee for throwing an outstanding event and drawing tons of great, great people. With Wikimania last August and WikiSym this past week, the Wiki community is really starting to gel. And it’s about time. Can you believe Wikis are 10 years old?    (JXD)

Now the bad news: I walked away with some action items. How do I get myself into these messes?!    (JXE)

The first action item can be traced back to an ad hoc meeting that happened at Wikimania regarding WikiSpam. On August 6, a group of Wiki developers — me (PurpleWiki), Alex Schroeder (OddMuse), Brion Vibber (Mediawiki), Thomas Waldmann (MoinMoin), Sven Dowideit (TWiki), Janne Jalkanen (JSPWiki) — along with John Breslin and Jochen Topf, got together to discuss ways we could collaborate on fighting WikiSpam. Our goal was to identify the simplest possible first step and not to get mired in process discussions.    (JXF)

Since all of us were already maintaining URL blacklists, we decided to merge them and host it as a Sourceforge project. We agreed on a standard format (which I’ll document and post soon), and we agreed to send our respective lists to Alex, who already has scripts to slice, dice, and merge.    (JXG)

One of my action items then was to create the Sourceforge project. I did that immediately, but for some reason, the project was rejected. Thus began a month-long go-around with Sourceforge support where I tried to discover why they had rejected the proposal. In the end, the project was approved, and I never got an answer as to why it was rejected in the first place. At that point, I was mired in other work, and so I never followed up.    (JXH)

WikiSym was the kick in the butt I needed to follow-up. On Sunday, Sunir Shah hosted an antispam workshop, which about 40 people attended. First, Sunir reviewed techniques (many of which are listed at MeatBall:WikiSpam). Then we broke out.    (JXI)

In my breakout, I described what we had agreed on at Wikimania. Then Peter Kaminski described a very cute idea he had for making it easy to fight WikiSpam. In a nutshell, Peter suggested we write a simple drop-in replacement CGI wrapper that would filter a POST payload for spam and call the real CGI script — be it a Wiki, a blog, or anything else — if the payload were spam-free. Such a wrapper would enable users to install spam-protection for any CGI script without having to write a single line of code and without having to do any complex configuration. It wouldn’t require any special access to your web server, since it would just be a CGI script. And you could easily add other spam-fighting measures, such as throttling and IP blacklists.    (JXJ)

I thought it was a brilliant idea. So Peter and I sat down afterwards and whipped it up. Took about an hour. It’s called Eaton, it works, and it’s Public Domain. Peter Kaminski has already blogged about it, and there’s some important commentary there from Jay Allen, the creator of MT-Blacklist.    (JXK)

It’s a proof of concept, and it won’t scale. It can and should be improved, and I’d encourage folks to do so. Nevertheless, it’s pretty cool. Bravo to Peter for a very clever idea.    (JXL)

By the way, the first person to figure out the origins of the name “Eaton” wins a cookie.    (JXM)

Fleischbutter

With WikiSym about to start, I want to close the loop on an obscure Wikimania item I posted last August. I mentioned something about Fleischbutter. Several folks emailed me asking what it was.    (JX8)

It’s exactly what it sounds like, folks. The literal translation is “meat butter.” More information is available here (thanks to Samuel Klein for spotting this). If folks have pictures of this legendary German dish, please post them there.    (JX9)

Speaking of legends, my friend Dave Arnold recently forwarded me a valuable resource regarding the Flying Spaghetti Monster. I encourage you all to spread the word.    (JXA)

Queer Numbers

At BAR Camp, I ran into Kragen Sitaker who had an idea for a variant on Purple Numbers called Queer Numbers. Kragen recently blogged the idea (spotted by Matthew O’Connor).    (JWJ)

In brief, Purple Numbers are wonderful, assuming the author has generated them. If the author hasn’t, you can use a proxy, such as PurpleSlurple. The problem with PurpleSlurple is that the addresses aren’t stable. If the author inserts a paragraph into the document, the PurpleSlurple address will point to the wrong place.    (JWK)

Queer Numbers solve this problem by generating stable (maybe) identifiers based on some content analysis. Using this algorithm, you can address granular content on any page and feel fairly confident that the link will go to the right place. The level of confidence is still up in the air, as Kragen notes in his blog post.    (JWL)

Kragen referenced some work on lexical signatures for persistent naming of Web pages. (Ironically, Kragen didn’t have the link, and the original link is broken!) That work was Thomas Phelps and Robert Wilensky‘s Robust Hyperlinks, and it’s good stuff.    (JWM)

Some additional prior art: Doug Engelbart once told me that his lab had explored the idea of generating granular addresses through a hashing algorithm similar to Kragen’s. (Great minds think alike!) If I recall, their algorithm was less sophisticated than Kragen’s, and I don’t think they got too far with the idea, but I’ll have to double check with Doug to be sure.    (JWN)

About four years ago, I met a fellow named Alon Schwartz through Doug. Alon had founded an Israeli startup called BrowseUp, where he had independently come up with ideas such as granular linking and Backlinks, only to discover that Doug had thought of these ideas a half century earlier. Alon was delighted by this discovery and tried to convince Doug to join forces, but Doug wasn’t interested in getting involved with proprietary software, and BrowseUp eventually suffered the fate of most Dot Coms.    (JWO)

BrowseUp‘s product was a proxy server and browser plugin that gave you granular linking, backlinks, and link types to existing web content. It was pretty cool, and it’s too bad it never got much attention. Alon used a hashing algorithm to generate unique granular addresses that he claimed were over 90 percent stable across different versions of a document. Of course, he wouldn’t tell me what the algorithm was, because the product was proprietary.    (JWP)

I think Kragen’s onto something good, and I hope he’ll turn his idea into code soon so that we can start playing with Queer Numbers in earnest.    (JWQ)