eekim.com > Software > a2h


NAME

Augment.pm - class for accessing/converting Augment files


SYNOPSIS

  use Augment;

  my $a = new Augment 'file.txt';
  $a->writeHTML;
  print "file.txt converted and written to " . $a->getHTMLfilename;
  $a->destroy;

DESCRIPTION

Parses exported Augment file (generated by Doug Engelbart from his Augment system) into a Perl object for conversion into other formats. Currently only exports HTML.


EXPORTED AUGMENT FILE

The exported Augment file contains formatted ASCII text that is automatically generated by one of Doug's Augment scripts, with some manual tweaking here and there. Doug's script does the following.

First, it includes the Augment filename and the date of the file in the first and third line of the exported file respectively. Directories are delimited by commas, with the exception of the trailing comma, which can be used to add addressing information. For example:

    AUGMENT,132505,

refers to the file 132505 in the AUGMENT directory.

Second, it formats Augment addressing information (hierarchical address, statement ID (SID), and optional label) and prepends it to the statement. The resulting string is delimited by colons, and is separated from the statement by a pound sign. For example:

    12B:0177:Reference-2#Reference-2: A reference.

The string preceding the pound sign is the address string, with an hierarchical address is 12B, a statement ID of 0177, and the label ``Reference-2''.

Hierarchical addresses could easily be computed by a2h.pl, but since that information is already in Augment, we figured we might as well take advantage of it. Augment statement IDs are always prefixed by a zero. The third field, the label, is optional.

Third, it indents statements by multiples of three spaces and word-wraps them at 80 columns. It typically separates statements by a blank line. However, Augment does not treat newlines as statement delimiters. In other words, you cannot assume that paragraphs separated by blank lines are unique statements. Because of this behavior, the only way to identify a unique statement in an exported Augment file is by the address string that precedes it.

Some Augment files include directives meant primarily for printed output. They generally look something like:

    .SnfShow=Off;

Doug's script is supposed to turn these off before outputting text, but it doesn't look like he's made that modification yet, so this script tries to identify and strip these directives itself.

This is what an exported Augment file might look like:

    0:01:#Sample Augment file

    1:02:Introduction#Introduction

    1A:03:#This is the introductory paragraph.

        1B:04:#This is the second paragraph

DATA STRUCTURE

a2h.pl parses the file into a hash that stores some metadata and a reference to the parse tree. The keys to the hash are:

    'filename' => Augment filename,
    'date' => publication date
    'statements' => reference to parse tree

The parse tree consists of a reference to an array of references pointing either to a statement hash or to another array of references. The statement hash contains the following:

    $statement = {
        'address' => hierarchical address,
        'sid' => statement ID,
        'label' => optional label,
        'data' => statement data
    };

The following is a populated $statements data structure corresponding to the sample exported Augment file shown above.

    $statements = [
        { 'address' => '0',
          'sid' => '01',
          'label' => '',
          'data' => 'Sample Augment file'
        },
        { 'address' => '1',
          'sid' => '02',
          'label' => 'Introduction',
          'data' => 'Introduction'
        },
        [
            { 'address' => '1A',
              'sid' => '03',
              'label' => '',
              'data' => 'This is the introductory paragraph.'
            },
            { 'address' => '1B',
              'sid' => '04',
              'label' => '',
              'data' => 'This is the second paragraph.'
            },
        ]
    ];

METHODS


new($fname, %options)

 pre:
   $fname - name of exported Augment file
   $options{'html_files_dir'} - directory where converted HTML files go.
       If not specified, set to $_HTML_FILES_DIR.

 post:
   none

Augment constructor. Reads an exported Augment file, and parses it into an internal data structure.


writeHTML($a_fname)

 pre:
   $a_fname - Augment filename.

 post:
   none

Generates HTML from the parse tree, converts Augment filename to HTML filename, and writes the HTML file.


getHTMLfilename

 pre:
   none

 post:
   none

Returns the corresponding HTML filename of an Augment file.


destroy

 pre:
   none

 post:
   none

Augment destructor. Undefines the Augment object, thus allowing the garbage collector to free its memory.


PRIVATE METHODS


_parse_file($input_file)

 pre:
   $input_file - name of exported Augment file to convert.

 post:
   \%parse_tree - statements + metadata (described above).

Parses the input file, and returns metadata about the file, and the parse tree populated with data from the parsed file.


_statements_to_html($statements, $fh, $indent_level)

 pre:
   $statements - parse tree (w/o root metadata)
   $fh - file handle to where converted HTML is printed.
   $indent_level - current indentation level.  Used to determine which
       HTML header style to print.

 post:
   none

Recursive function that traverses the parse tree and prints HTML statements with appropriate addresses, indentation, and other special formatting (such as the infamous ``purple'' numbers).


_html_header($fh, $fname, $date)

 pre:
   $fh - file handle to where converted HTML is printed.
   $fname - name of Augment file
   $date - date of Augment file

 post:
   none

Prints the HTML header tags with embedded stylesheet and other metadata.


_html_footer($fh)

 pre:
   $fh - file handle to where converted HTML is printed.

 post:
   none

Prints the HTML footer tags.


_convert_filename($a_fname)

 pre:
   $a_fname - original Augment filename

 post:
   $a_fname - converted filename

Converts an Augment filename to a Web/UNIX-friendly filename by replacing the trailing comma with '.html' and all other commas with forward slashes.


_create_subdirectories($path)

 pre:
   $path - fully qualified UNIX path and filename

 post:
   none

Creates the appropriate directories if they do not already exist.


TO DO


Better Exception Handling

Augment.pm does very little error handling. This is a bad thing.


Better Object-Orientation

It might be nice to have a generic write() method and to separate the HTML-related methods into a subclass. Then, anytime someone wanted to write a new conversion module, that person could just subclass Augment, and overload write().

There's not a great need for this. It's fairly straightforward to add new methods to the class as it currently stands.


Generic Manipulation Methods

It might be nice to have some generic functions for manipulating the parse tree, perhaps a traverse() method. However, as I said before, there's no great need for this right now; time is better spent on other areas, especially the ones listed below.


Augment Links

This version of a2h.pl does not convert Augment links to HTML links. This is nontrivial for a number of reasons. Syntactically, any text in an Augment file delimited by parentheses or angle brackets is potentially a link. (At some point, Doug's Augment team standardized on angle brackets for their link format, but some documents still use parentheses.) In Augment, if you pointed to some text so delimited and tried to jump to that location, if it were a valid link (i.e. entry in the link database), Augment would go there; otherwise, Augment would just ignore the command.

In order to do Augment link conversion, this script should assemble all of the legal addresses within this document and store them in a link database. It should then identify anything that looks like a link, and search the database for such a link. If that link exists, then it should create the appropriate HTML link.

An additional challenge is that Augment had a number of linking semantics not supported by HTML links, such as sophisticated addressing and indirect links. These can be mapped to the XML XLink specification fairly easily, but determining how to map these XLinks to HTML links is a nontrivial problem.

Both of the above issues are opportunities for synergy with the main OHS development. For example, we could use the OHS link database specification to generate a database of Augment links. We could also use the OHS XML->HTML transcoder to determine how XLink links are converted to HTML links. Of course, both of these components are currently non-existent.


Improved parsing

Augment had a fairly generic markup language that did not specify things such as headlines, lists, tables, etc. It would be nice to develop a more sophisticated set of rules that did a better job of deciding whether something should be an HTML list or table.


Augment->XML

This script was developed primarily as a quick and dirty way to let Doug post old and new Augment documents on the Web in an addressable manner. Eventually, this script should convert Augment files to XML, which could then be transcoded into HTML. Once an appropriate DTD is developed, this should be fairly trivial, because the Augment file is converted into an intermediate parse tree that could easily be used to generate all sorts of output.


HISTORY

Shinya Yamada <shinya@bootstrap.org> wrote the first Augment->HTML convertor in Java, and released it on August 20, 2000. Doug Engelbart <doug@bootstrap.org> made changes to his export script and suggested improvements to Shinya's work, which led to this rewrite of the convertor in Perl.

I released the first version of a2h.pl on October 6, 2000. On October 9, 2000, I rewrote and released Augment.pm, an object-oriented version of the appropriate a2h.pl functions.


AUTHOR

Eugene Eric Kim <eekim@eekim.com>