eekim.com >
Software >
a2h
|
Publications | Software | Talks | Weblog | About |
Augment.pm - class for accessing/converting Augment files
use Augment; my $a = new Augment 'file.txt'; $a->writeHTML; print "file.txt converted and written to " . $a->getHTMLfilename; $a->destroy;
Parses exported Augment file (generated by Doug Engelbart from his Augment system) into a Perl object for conversion into other formats. Currently only exports HTML.
The exported Augment file contains formatted ASCII text that is automatically generated by one of Doug's Augment scripts, with some manual tweaking here and there. Doug's script does the following.
First, it includes the Augment filename and the date of the file in the first and third line of the exported file respectively. Directories are delimited by commas, with the exception of the trailing comma, which can be used to add addressing information. For example:
AUGMENT,132505,
refers to the file 132505 in the AUGMENT directory.
Second, it formats Augment addressing information (hierarchical address, statement ID (SID), and optional label) and prepends it to the statement. The resulting string is delimited by colons, and is separated from the statement by a pound sign. For example:
12B:0177:Reference-2#Reference-2: A reference.
The string preceding the pound sign is the address string, with an hierarchical address is 12B, a statement ID of 0177, and the label ``Reference-2''.
Hierarchical addresses could easily be computed by a2h.pl, but since that information is already in Augment, we figured we might as well take advantage of it. Augment statement IDs are always prefixed by a zero. The third field, the label, is optional.
Third, it indents statements by multiples of three spaces and word-wraps them at 80 columns. It typically separates statements by a blank line. However, Augment does not treat newlines as statement delimiters. In other words, you cannot assume that paragraphs separated by blank lines are unique statements. Because of this behavior, the only way to identify a unique statement in an exported Augment file is by the address string that precedes it.
Some Augment files include directives meant primarily for printed output. They generally look something like:
.SnfShow=Off;
Doug's script is supposed to turn these off before outputting text, but it doesn't look like he's made that modification yet, so this script tries to identify and strip these directives itself.
This is what an exported Augment file might look like:
0:01:#Sample Augment file 1:02:Introduction#Introduction 1A:03:#This is the introductory paragraph. 1B:04:#This is the second paragraph
a2h.pl parses the file into a hash that stores some metadata and a reference to the parse tree. The keys to the hash are:
'filename' => Augment filename, 'date' => publication date 'statements' => reference to parse tree
The parse tree consists of a reference to an array of references pointing either to a statement hash or to another array of references. The statement hash contains the following:
$statement = { 'address' => hierarchical address, 'sid' => statement ID, 'label' => optional label, 'data' => statement data };
The following is a populated $statements
data structure
corresponding to the sample exported Augment file shown above.
$statements = [ { 'address' => '0', 'sid' => '01', 'label' => '', 'data' => 'Sample Augment file' }, { 'address' => '1', 'sid' => '02', 'label' => 'Introduction', 'data' => 'Introduction' }, [ { 'address' => '1A', 'sid' => '03', 'label' => '', 'data' => 'This is the introductory paragraph.' }, { 'address' => '1B', 'sid' => '04', 'label' => '', 'data' => 'This is the second paragraph.' }, ] ];
pre: $fname - name of exported Augment file $options{'html_files_dir'} - directory where converted HTML files go. If not specified, set to $_HTML_FILES_DIR. post: none
Augment constructor. Reads an exported Augment file, and parses it into an internal data structure.
pre: $a_fname - Augment filename. post: none
Generates HTML from the parse tree, converts Augment filename to HTML filename, and writes the HTML file.
pre: none post: none
Returns the corresponding HTML filename of an Augment file.
pre: none post: none
Augment destructor. Undefines the Augment object, thus allowing the garbage collector to free its memory.
pre: $input_file - name of exported Augment file to convert. post: \%parse_tree - statements + metadata (described above).
Parses the input file, and returns metadata about the file, and the parse tree populated with data from the parsed file.
pre: $statements - parse tree (w/o root metadata) $fh - file handle to where converted HTML is printed. $indent_level - current indentation level. Used to determine which HTML header style to print. post: none
Recursive function that traverses the parse tree and prints HTML statements with appropriate addresses, indentation, and other special formatting (such as the infamous ``purple'' numbers).
pre: $fh - file handle to where converted HTML is printed. $fname - name of Augment file $date - date of Augment file post: none
Prints the HTML header tags with embedded stylesheet and other metadata.
pre: $fh - file handle to where converted HTML is printed. post: none
Prints the HTML footer tags.
pre: $a_fname - original Augment filename post: $a_fname - converted filename
Converts an Augment filename to a Web/UNIX-friendly filename by replacing the trailing comma with '.html' and all other commas with forward slashes.
pre: $path - fully qualified UNIX path and filename post: none
Creates the appropriate directories if they do not already exist.
Augment.pm does very little error handling. This is a bad thing.
It might be nice to have a generic write()
method and to
separate the HTML-related methods into a subclass. Then, anytime someone
wanted to write a new conversion module, that person could just subclass
Augment, and overload write().
There's not a great need for this. It's fairly straightforward to add new methods to the class as it currently stands.
It might be nice to have some generic functions for manipulating the parse
tree, perhaps a traverse()
method. However, as I said before,
there's no great need for this right now; time is better spent on other
areas, especially the ones listed below.
This version of a2h.pl does not convert Augment links to HTML links. This is nontrivial for a number of reasons. Syntactically, any text in an Augment file delimited by parentheses or angle brackets is potentially a link. (At some point, Doug's Augment team standardized on angle brackets for their link format, but some documents still use parentheses.) In Augment, if you pointed to some text so delimited and tried to jump to that location, if it were a valid link (i.e. entry in the link database), Augment would go there; otherwise, Augment would just ignore the command.
In order to do Augment link conversion, this script should assemble all of the legal addresses within this document and store them in a link database. It should then identify anything that looks like a link, and search the database for such a link. If that link exists, then it should create the appropriate HTML link.
An additional challenge is that Augment had a number of linking semantics not supported by HTML links, such as sophisticated addressing and indirect links. These can be mapped to the XML XLink specification fairly easily, but determining how to map these XLinks to HTML links is a nontrivial problem.
Both of the above issues are opportunities for synergy with the main OHS development. For example, we could use the OHS link database specification to generate a database of Augment links. We could also use the OHS XML->HTML transcoder to determine how XLink links are converted to HTML links. Of course, both of these components are currently non-existent.
Augment had a fairly generic markup language that did not specify things such as headlines, lists, tables, etc. It would be nice to develop a more sophisticated set of rules that did a better job of deciding whether something should be an HTML list or table.
This script was developed primarily as a quick and dirty way to let Doug post old and new Augment documents on the Web in an addressable manner. Eventually, this script should convert Augment files to XML, which could then be transcoded into HTML. Once an appropriate DTD is developed, this should be fairly trivial, because the Augment file is converted into an intermediate parse tree that could easily be used to generate all sorts of output.
Shinya Yamada <shinya@bootstrap.org> wrote the first Augment->HTML convertor in Java, and released it on August 20, 2000. Doug Engelbart <doug@bootstrap.org> made changes to his export script and suggested improvements to Shinya's work, which led to this rewrite of the convertor in Perl.
I released the first version of a2h.pl on October 6, 2000. On October 9, 2000, I rewrote and released Augment.pm, an object-oriented version of the appropriate a2h.pl functions.
Eugene Eric Kim <eekim@eekim.com>
Copyright © 2001-2003 Eugene Eric Kim /
eekim@eekim.com.
All rights reserved. Revision: $Id: Augment.html,v 1.1 2001/05/24 22:40:38 eekim Exp $ |
Publications | Software | Talks | Weblog | About |