An Introduction to Purple

Eugene Eric Kim <eekim@eekim.com>

Version 1.3
August 28, 2001


Introduction    1  (01)

Purple is a small suite of quickly hacked tools inspired by Doug Engelbart's attempt to bootstrap the addressing features of his Augment system onto HTML pages. Its purpose is simple: produce HTML documents that can be addressed at the paragraph level. It does this by automatically creating name anchors with static and hierarchical addresses at the beginning of each text node, and by displaying these addresses as links at the end of each text node.    1A  (02)

Purple consists of an XML DTD, an XSLT script for transforming the XML into HTML, various Perl scripts for preprocessing the XML, and a CGI script for displaying different versions of an HTML file. Its name was inspired by the little purple numbers found at the Bootstrap Institute's web site.    1B  (03)

Purple is a makeshift solution to a relatively simple problem, and will hopefully be rendered obsolete very quickly by preliminary versions of Doug Engelbart's Open Hyperdocument System (OHS).    1C  (04)

However, developing Purple was also an excuse to play with XML and its associated technologies, such as the DOM and XSLT, and also to think about the methodologies that will best take advantage of tools such as the OHS.    1D  (05)

Granular Addressability in HTML Documents    2  (06)

HTML allows you to link to parts of a document if the target is defined (using the <a name=""> tag). However, if no target is defined, then it is impossible to address parts of an HTML document.    2A  (07)

This is a serious deficiency for the World Wide Web as a system for collaboration and knowledge management, one that the World Wide Web Consortium is addressing with XLink. However, until XML and XLink become widespread, the problem is not going to go away.    2B  (08)

Engelbart's makeshift solution was to bootstrap some of the addressability features of his Augment system onto HTML. On documents at Engelbart's Bootstrap Institute web site, webmasters have added HTML anchors to each chunk of text with names corresponding to Augment addresses. These addresses are displayed as links at the end of each text chunk. An example of this is the HTML version of this very document.    2C  (09)

The result is crude, but adequate, and it works on existing systems. If you want to link to a paragraph or a list item, you use the purple number as an address. Because the purple numbers are links, you can copy and paste the link into your own document, rather than typing the whole thing from scratch.    2D  (010)

These anchors can also be used for adding metadata or topics to granular chunks of HTML documents using RDF or Topic Maps respectively. RDF triples and Topic Map occurrences allow you to define relationships between data specified by URIs. With Purple, you can insert the URL of a paragraph or list item into the appropriate RDF or Topic Map attribute.    2E  (044)

The purple numbers were especially nice for Augment documents converted to HTML, because the purple numbers corresponded exactly to the addresses used in the Augment system. However, there are a number of deficiencies:    2F  (011)

This last item is the true motivation for Purple.    2H  (015)

Tools    3  (016)

Purple consists of the following tools:    3A  (017)

(A tarball containing all of these tools and more is available.)    3C  (024)

Each of these tools are self-documenting, and include to do lists. Whether or not any of the items on these lists will ever get done is questionable. Remember, these were all quick hacks designed to get some stuff done and out there. Rather than spend time transforming these tools into full-fledged systems, I'd rather spend time applying lessons learned to the OHS, which will make all this stuff obsolete.    3D  (025)

Methodology    4  (027)

Purple was designed to provide some semblance of link integrity by supporting a crude form of versioning. Augment-style hierarchical addresses work great for published documents, because the documents are fairly static. However, they are not very helpful for dynamic documents, because as items get moved around, links to a hierarchical address will potentially point to the wrong item.    4A  (028)

Augment-style statement IDs (NIDs) are better references for dynamic documents, because even if an item's location changes in new versions of a document, the link will continue to point to the correct item. However, manually adding unique NIDs to each text node is gruesome.    4B  (029)

Ideally, versioning of nodes should be well-integrated in the document editing system, which is one of the planned features of the OHS. However, until that happens, we need a temporary solution.    4C  (030)

The way I use purple is as follows. I write a document that conforms to purple.dtd, ignoring NIDs and HIDs. When I'm done with a first version of that document, I run add_ids.pl on that document. The script goes through the XML file and adds unique NIDs to each node and computes and adds the proper hierarchical address (HIDs) to each node. I then go ahead and publish a version of that file by converting it to HTML and registering it with dkr.    4D  (031)

When I subsequently edit that XML file, I don't touch any of the address attributes. So if I move a node, that node's NID moves with it. If I create a new node, again, I ignore addresses. When I'm ready to publish again, I run add_ids.pl. This time, the script leaves nodes with NIDs alone, and adds new NIDs to the nodes without addresses, starting with the largest available NID. It then recomputes the proper HIDs, and adds those to all of the nodes. You are now ready to publish a new version of the document.    4E  (032)

This provides some semblance of link integrity, because links pointing to a node that has moved will still point to the right place. However, the integrity is far from perfect.    4F  (033)

I wrote dkr to address this deficiency. dkr lets you register new versions of a Web document, and displays any version of that document. By specifying the version of a document using dkr in your links, you can guarantee link integrity. Additionally, through a little bit of trickery not yet implemented, dkr could help you track and update your links to stay current with new versions in a semi-automated way.    4H  (036)

One problem with developing documents with Purple is creating links between documents before any addresses are added to nodes. The problem results from latency between creating nodes and adding NIDs to these nodes. I get around this by generating skeleton links with placeholders of my own device. Once my documents are ready, I run add_ids.pl and manually replace the placeholders with the actual node addresses.    4I  (037)

Colophon    5  (038)

In the spirit of bootstrapping, this document was developed using Purple.    5A  (039)

I used James Clark's xt XSLT parser and xp XML parser to generate the HTML files.    5B  (040)

All of the scripts are written in Perl. I used the XML::DOM module (which can be found at http://search.cpan.org/ and which uses James Clark's expat XML parser) to manipulate the XML files.    5C  (041)

My XML editor of choice is Xemacs using psgml mode.    5D  (042)

Purple is open source under the BSD License. I encourage you to do whatever you want with it. Whatever you do, I'd like to hear about it.    5E  (043)