Purple Numbers and Link Integrity

Danny Ayers is looking to implement Purple Numbers in his Wiki, and had the following question:    (66)

But is the expectation that the anchor will always refer to the same information item?    (67)

If we’re going for coolness, I think this may cause problems in the context of Wikis. Ok, pages come and go but the URI will (usually) always address something sensible – edit new page if the one originally addressed has gone.    (68)

But the Purple anchors are pointing to info-snippets that may be modified (no problem – it’s still conceptualy the same item) or be deleted (problem).    (69)

The expectation is that the information to which an anchor points may change. This is obviously not ideal.    (6A)

In March 2001, I wrote some notes for the Open Hyperdocument System entitled, “Thoughts on Link Integrity.” I had posted those notes to a mailing list, but those archives no longer exist, so I reproduce those thoughts below.    (6B)

Danny also mentioned using trackback as an aggregator of comments. There is such a system, which Chris Dent also mentions in his response: Internet Topic Exchange. We used it for the 2003 PlaNetwork Conference to aggregate blogs about the conference.    (6C)

Thoughts on Link Integrity    (6D)

We want the OHS to maintain link integrity across all documents. In other words, once you create a link to something, it should never break.    (6E)

The first requirement for link integrity is that documents are never deleted from the system. If you link to a document, and that document is subsequently removed, the link breaks. The only way to fix that link is to put the document back into the system.    (6F)

The second requirement is to have a logical naming scheme that is separate from the physical name and location of a document. On the web, if you have the document http://foo.com/bar.html, and you move it to http://foo.com/new/bar.html, links to the first URL break. You need a name for that document that will always point to the right place, even if the document is physically moved to a different part of the system.    (6G)

The third requirement is version control. This is where things start to get a little hairy. Version controlled systems are insert-only. In theory, nothing is ever removed. This satisfies the first requirement.    (6H)

However, in a useful DKR, links don’t just not break, they also evolve. Suppose you have a document, foo.txt, that contains the following text:    (6I)

  These are the dasy that try men's souls.    (6J)
  Example. foo.txt, version 1.    (6K)

Note that there’s a typo — “dasy” should be “days.”    (6L)

Now suppose someone creates a link to this sentence in this version of the document. Suppose that afterwards, you notice the typo and correct it. This results in a new version of the document:    (6M)

  These are the days that try men's souls.    (6N)
  Example. foo.txt, version 2.    (6O)

If your links neither broke nor evolved, then the original link would continue to point to version 1 of the document, not this new version. However, this does not always seem to be desirable behavior. If I created a link to this sentence — essentially designating it interesting and relevant content — when the typo is corrected, I’d prefer that the link now point to the corrected document, version 2.    (6P)

This is certainly doable. The system could automatically assume that the link pointing to the first sentence in version 1 should now point to the first sentence in version 2.    (6Q)

However, there are two scenarios when this would not be the correct behavior. First, what if, instead of fixing the typo, the sentence was changed to:    (6R)

  Livin' la vida loca.    (6S)
  Example. foo.txt, version 3.    (6T)

If the purpose of the link is to designate the target content as relevant, then the content of the first sentence of this third version no longer applies, because the meaning of the sentence has completely reversed.    (6U)

Second, what if the link is from an annotation that says, “There’s a typo in this sentence”? In this case, you would want the link to point only to version 1, since the typo does not exist in version 2 (and, for that matter, in version 3).    (6V)

How can we accomodate these scenarios? One solution would be to allow the user to define how the link should evolve with new versions of the document. So, for example, you could specify that the link that points to the first sentence in version 1 should also point to the first sentence in some number of subsequent versions of foo.txt.    (6W)

Another solution would be to have the system automatically notify everyone who has linked to a document (or who has otherwise registered for notification) that the document has changed, and have those people manually update their links, possibly providing suggestions as to how to update the links.    (6X)

Leave a Reply