Queer Numbers

At BAR Camp, I ran into Kragen Sitaker who had an idea for a variant on Purple Numbers called Queer Numbers. Kragen recently blogged the idea (spotted by Matthew O’Connor).    (JWJ)

In brief, Purple Numbers are wonderful, assuming the author has generated them. If the author hasn’t, you can use a proxy, such as PurpleSlurple. The problem with PurpleSlurple is that the addresses aren’t stable. If the author inserts a paragraph into the document, the PurpleSlurple address will point to the wrong place.    (JWK)

Queer Numbers solve this problem by generating stable (maybe) identifiers based on some content analysis. Using this algorithm, you can address granular content on any page and feel fairly confident that the link will go to the right place. The level of confidence is still up in the air, as Kragen notes in his blog post.    (JWL)

Kragen referenced some work on lexical signatures for persistent naming of Web pages. (Ironically, Kragen didn’t have the link, and the original link is broken!) That work was Thomas Phelps and Robert Wilensky‘s Robust Hyperlinks, and it’s good stuff.    (JWM)

Some additional prior art: Doug Engelbart once told me that his lab had explored the idea of generating granular addresses through a hashing algorithm similar to Kragen’s. (Great minds think alike!) If I recall, their algorithm was less sophisticated than Kragen’s, and I don’t think they got too far with the idea, but I’ll have to double check with Doug to be sure.    (JWN)

About four years ago, I met a fellow named Alon Schwartz through Doug. Alon had founded an Israeli startup called BrowseUp, where he had independently come up with ideas such as granular linking and Backlinks, only to discover that Doug had thought of these ideas a half century earlier. Alon was delighted by this discovery and tried to convince Doug to join forces, but Doug wasn’t interested in getting involved with proprietary software, and BrowseUp eventually suffered the fate of most Dot Coms.    (JWO)

BrowseUp‘s product was a proxy server and browser plugin that gave you granular linking, backlinks, and link types to existing web content. It was pretty cool, and it’s too bad it never got much attention. Alon used a hashing algorithm to generate unique granular addresses that he claimed were over 90 percent stable across different versions of a document. Of course, he wouldn’t tell me what the algorithm was, because the product was proprietary.    (JWP)

I think Kragen’s onto something good, and I hope he’ll turn his idea into code soon so that we can start playing with Queer Numbers in earnest.    (JWQ)

Stopover in Bloomington

On my way from Fort Wright to St. Louis, I stopped over in Bloomington to have lunch with Chris Dent and some of his colleagues, Joe Blaylock, Kevin Bohan, and Matthew O’Connor. Matthew is one of The Canonical Hackers.    (1YQ)

Lunch conversation was good — spent two hours longer than I had planned. I especially enjoyed meeting Matthew, as well as Paul Visscher and Jason Cook a few nights earlier. You can gather a surprising amount from interacting with folks via email alone, but it’s still only a partial picture. It was good to finally meet these guys in person, and to get a sense of their personalities and passions.    (1YR)

Arrived at Scott Foehner‘s place in St. Louis at around 8:30pm. Had dinner on The Hill at an Italian restaurant called Via’s, then went to Milo’s, a neighborhood bar, for drinks. I was surprised to learn that folks in St. Louis brew beers other than Budweiser. Had a Schlafly’s there, which was very good.    (1YS)

Transclusions, Path-Based Addressing, and Version Control

The PurpleWiki community has been rumbling recently, thanks largely to the contributions of two members of the Canonical Hackers, Jason Cook and Matthew O’Connor. Jason wrote perplog, an IRC logger that supports Purple Numbers and Transclusions. Matthew hacked a PurpleWiki node manager, then started adding and fixing other stuff, including an XML-RPC interface. Additionally, John Sechrest developed an experimental email interface to PurpleWiki. Lots of great stuff. It’s forcing us to get off our butts and make some long-promised changes and explain some long-undocumented things. Open Source is a wonderful thing.    (117)

A lot of the excitement is because of PurpleWiki‘s support for Transclusions. We had Transclusions in mind when we architected PurpleWiki, but we (or I, at least) didn’t think Transclusions would actually be implemented until much later. However, exactly one year ago today, Chris Dent got the itch and started playing. A few months later, Chris committed some code, and suddenly, we had Transclusions. It was a total hack, but it worked, and it was unexpectedly cool.    (118)

It’s still a hack, and it needs to be cleaned up, but it’s suddenly become a higher priority. First, we have a pretty good idea of how to support Transclusions “correctly.” Second, having had the chance to use Transclusions regularly, we are starting to recognize their utility and want to take greater advantage of that. (See, for example, my early specs for Abelard.) Third, people are starting to get excited about them.    (119)

Transcluding Multiple Chunks    (11A)

Currently, we support Transclusions of individual nodes (paragraphs, headers, list items, etc.) via the following syntax:    (11B)

  [t nid]    (11C)

where nid is the ID of the node you want to transclude. This works fine when you want to transclude small chunks, but at times, it’s useful to be able to transclude multiple chunks on a page. Rather than specify a transclusion for each individual node, it would be nice to have a syntax for specifying a collection of nodes in a single transclusion.    (11D)

Chris proposed the following syntax:    (11E)

  [t nid,nid,nid,...]    (11F)

This is problematic. The current implementation suggests that the transclusion command be replaced by the content identified by the specified NID. This proposal suggests that the command be replaced by both the content and the structure of the content. If you had a document like:    (11G)

  = Plan for World Domination {nid 1} =    (11H)
  # Finish PurpleWiki. {nid 2}    (11I)

and you tried to transclude this content with:    (11J)

  * [t 1, 2]    (11K)

what is the parser supposed to translate this to?    (11L)

A proper solution must be treated as its own structural element within a PurpleWiki document. More importantly, the syntax should capture document-specific context. This contrasts with the current syntax, which ignores document context entirely.    (11M)

Why is document context important? Suppose you have the following task list:    (11N)

  = To Do {nid 3} =    (11O)
  * Buy milk. {nid 4}   * Feed iguana. {nid 5}   * Implement distributed Transclusions. {nid 6}   * Vote in primaries. {nid 7}   * Expose API to Backlinks. {nid 8}    (11P)

Suppose you want to start a PurpleWiki-specific task list, transcluding all of the list items relevant to PurpleWiki (in this case, nodes 6 and 8). The resulting document might look like:    (11Q)

  {title PurpleWikiToDo}    (11R)
  = PurpleWiki To Do {nid 9} =    (11S)
  * [t 6] {nid A}   * [t 8] {nid B}    (11T)

Now, suppose you want to replicate this task list on another page. You could transclude all of the items individually, just as you do on the PurpleWiki To Do page. In this case, a slight variation of Chris’s proposed syntax (a standalone structural element) would simplify that process. (It also raises an interesting question: Which NIDs do you use for the transclusions: 6 and 8, or A and B? Does it make a difference?)    (11U)

However, what I really want to do is say, “Transclude all of the list items on the ‘PurpleWiki To Do’ page.” For this, you want something like XPointer:    (11V)

  [transclude PurpleWikiToDo#xpointer(id("9")/li)]    (11W)

A few observations: First, the command should be on a line by itself, and it should be interpreted as an independent structural element that will be replaced by a set of structural elements and content. I used “transclude” instead of “t” to make the point that these are two different commands. Second, the Transclusion command specifies a range of nodes within a document, as opposed to a document-independent list of nodes. Third, this combines a path-based address with an ID-based address.    (11X)

Fourth (and this is an implementation detail), if we want to support such syntax, it would behoove us to use an XML data model rather than the home grown model we’re currently using. This way, we could easily plug in existing XPointer implementations to do the queries.    (11Y)

Version Control    (11Z)

The fact that Wiki pages are dynamic throws a kink into all of this. The syntax I propose above takes this into account for the most part. Barring major changes to the PurpleWiki To Do page, the transcluded content will include all of the PurpleWiki tasks, even if more items are added later. If you had to list a set of NIDs, then you would have to be diligent about updating that list manually every time the To Do page changed.    (120)

In addition to supporting path-based addressing, we also need to allow people to specify versions in the address. In other words, you may want to transclude a specific version of a node or a set of nodes from a specific version of a document.    (121)

This shouldn’t be too difficult, but there are some complications. The biggie is whether to transclude a node if the node no longer exists in any document. My instinct right now is telling me that yes, it should, but it should make it clear somehow that it’s an orphan node. (See my previous entry on link integrity for more on this.)    (122)