At BAR Camp, I ran into Kragen Sitaker who had an idea for a variant on Purple Numbers called Queer Numbers. Kragen recently blogged the idea (spotted by Matthew O’Connor). (JWJ)
In brief, Purple Numbers are wonderful, assuming the author has generated them. If the author hasn’t, you can use a proxy, such as PurpleSlurple. The problem with PurpleSlurple is that the addresses aren’t stable. If the author inserts a paragraph into the document, the PurpleSlurple address will point to the wrong place. (JWK)
Queer Numbers solve this problem by generating stable (maybe) identifiers based on some content analysis. Using this algorithm, you can address granular content on any page and feel fairly confident that the link will go to the right place. The level of confidence is still up in the air, as Kragen notes in his blog post. (JWL)
Kragen referenced some work on lexical signatures for persistent naming of Web pages. (Ironically, Kragen didn’t have the link, and the original link is broken!) That work was Thomas Phelps and Robert Wilensky‘s Robust Hyperlinks, and it’s good stuff. (JWM)
Some additional prior art: Doug Engelbart once told me that his lab had explored the idea of generating granular addresses through a hashing algorithm similar to Kragen’s. (Great minds think alike!) If I recall, their algorithm was less sophisticated than Kragen’s, and I don’t think they got too far with the idea, but I’ll have to double check with Doug to be sure. (JWN)
About four years ago, I met a fellow named Alon Schwartz through Doug. Alon had founded an Israeli startup called BrowseUp, where he had independently come up with ideas such as granular linking and Backlinks, only to discover that Doug had thought of these ideas a half century earlier. Alon was delighted by this discovery and tried to convince Doug to join forces, but Doug wasn’t interested in getting involved with proprietary software, and BrowseUp eventually suffered the fate of most Dot Coms. (JWO)
BrowseUp‘s product was a proxy server and browser plugin that gave you granular linking, backlinks, and link types to existing web content. It was pretty cool, and it’s too bad it never got much attention. Alon used a hashing algorithm to generate unique granular addresses that he claimed were over 90 percent stable across different versions of a document. Of course, he wouldn’t tell me what the algorithm was, because the product was proprietary. (JWP)
I think Kragen’s onto something good, and I hope he’ll turn his idea into code soon so that we can start playing with Queer Numbers in earnest. (JWQ)
Wikipedia has a nice description of TF/IDF which
is at the heart of both Queer Numbers and
the Robust Hyperlinks.
http://en.wikipedia.org/wiki/Tf-idf