Blog – Page 164 – Eugene Eric Kim

Purple Metadata in Blosxom

Published August 13, 2003 by Eugene Eric Kim

PurpleWiki supports document metadata. The metadata is stored at the beginning of the document using the following syntax: (6Y)

  {name value}    (6Z)

Currently, PurpleWiki supports the following metadata: (70)

title (71)
subtitle (72)
id (73)
date (74)
version (75)
authors (76)

All of this metadata is available to blosxom via the purple plugin under the $purple namespace. However, I’d also like to override blosxom’s mechanism for determining an entry’s title and date. (77)

Sadly, overriding the title mechanism is not possible via plugins, and overriding the date would be extremely inefficient. Blosxom assumes that the first line of a file is the title and the rest is the body. This parsing happens before story() is called. This works fine, but it means that the raw file itself can’t be a pure PurpleWiki document. If blosxom instead passed the raw file and delegated the responsibility for parsing title and body to story(), then anyone would be able to customize this behavior via plugins. (78)

I also wanted to duplicate the behavior of the entries_index_tagged plugin using PurpleWiki‘s metadata. The way entries_index_tagged works is to compare the current directory tree with a cached tree. If there are discrepancies, entries_index_tagged scans the file for a meta-creation tag. If the tag exists, it uses that information; otherwise, it stats the file for its creation date. (79)

I could easily pass the file to the PurpleWiki parser to scan for the creation date, but the PurpleWiki parser is expensive, and the parsed file would not be cached. If I opted to use an entries_cache approach instead, where the comparison only happens once an hour, then maybe it wouldn’t be too much of an issue. However, I don’t want this feature badly enough to do this, and frankly, the inefficiency of doing it this way gnaws at me. (7A)

Purple Numbers and Link Integrity

Published August 13, 2003 by Eugene Eric Kim

Danny Ayers is looking to implement Purple Numbers in his Wiki, and had the following question: (66)

But is the expectation that the anchor will always refer to the same information item?    (67)

If we’re going for coolness, I think this may cause problems in the context of Wikis. Ok, pages come and go but the URI will (usually) always address something sensible – edit new page if the one originally addressed has gone.    (68)

But the Purple anchors are pointing to info-snippets that may be modified (no problem – it’s still conceptualy the same item) or be deleted (problem).    (69)

The expectation is that the information to which an anchor points may change. This is obviously not ideal. (6A)

In March 2001, I wrote some notes for the Open Hyperdocument System entitled, “Thoughts on Link Integrity.” I had posted those notes to a mailing list, but those archives no longer exist, so I reproduce those thoughts below. (6B)

Danny also mentioned using trackback as an aggregator of comments. There is such a system, which Chris Dent also mentions in his response: Internet Topic Exchange. We used it for the 2003 PlaNetwork Conference to aggregate blogs about the conference. (6C)

Thoughts on Link Integrity (6D)

We want the OHS to maintain link integrity across all documents. In other words, once you create a link to something, it should never break. (6E)

The first requirement for link integrity is that documents are never deleted from the system. If you link to a document, and that document is subsequently removed, the link breaks. The only way to fix that link is to put the document back into the system. (6F)

The second requirement is to have a logical naming scheme that is separate from the physical name and location of a document. On the web, if you have the document http://foo.com/bar.html, and you move it to http://foo.com/new/bar.html, links to the first URL break. You need a name for that document that will always point to the right place, even if the document is physically moved to a different part of the system. (6G)

The third requirement is version control. This is where things start to get a little hairy. Version controlled systems are insert-only. In theory, nothing is ever removed. This satisfies the first requirement. (6H)

However, in a useful DKR, links don’t just not break, they also evolve. Suppose you have a document, foo.txt, that contains the following text: (6I)

  These are the dasy that try men's souls.    (6J)

  Example. foo.txt, version 1.    (6K)

Note that there’s a typo — “dasy” should be “days.” (6L)

Now suppose someone creates a link to this sentence in this version of the document. Suppose that afterwards, you notice the typo and correct it. This results in a new version of the document: (6M)

  These are the days that try men's souls.    (6N)

  Example. foo.txt, version 2.    (6O)

If your links neither broke nor evolved, then the original link would continue to point to version 1 of the document, not this new version. However, this does not always seem to be desirable behavior. If I created a link to this sentence — essentially designating it interesting and relevant content — when the typo is corrected, I’d prefer that the link now point to the corrected document, version 2. (6P)

This is certainly doable. The system could automatically assume that the link pointing to the first sentence in version 1 should now point to the first sentence in version 2. (6Q)

However, there are two scenarios when this would not be the correct behavior. First, what if, instead of fixing the typo, the sentence was changed to: (6R)

  Livin' la vida loca.    (6S)

  Example. foo.txt, version 3.    (6T)

If the purpose of the link is to designate the target content as relevant, then the content of the first sentence of this third version no longer applies, because the meaning of the sentence has completely reversed. (6U)

Second, what if the link is from an annotation that says, “There’s a typo in this sentence”? In this case, you would want the link to point only to version 1, since the typo does not exist in version 2 (and, for that matter, in version 3). (6V)

How can we accomodate these scenarios? One solution would be to allow the user to define how the link should evolve with new versions of the document. So, for example, you could specify that the link that points to the first sentence in version 1 should also point to the first sentence in some number of subsequent versions of foo.txt. (6W)

Another solution would be to have the system automatically notify everyone who has linked to a document (or who has otherwise registered for notification) that the document has changed, and have those people manually update their links, possibly providing suggestions as to how to update the links. (6X)

OHS Launch Community: Experimenting with Ontologies

Published August 5, 2003 by Eugene Eric Kim

My review of The Semantic Web resulted in some very interesting comments. In particular, Danny Ayers challenged my point about focusing on human-understandable ontologies rather than machine-understandable ones: (5D)

But…”I think it would be significantly cheaper and more valuable to develop better ways of expressing human-understandable ontologies”. I agree with your underlying point here, but think it’s just the kind of the Semantic Web technologies can help with. The model used is basically very human-friendly – just saying stuff about things, using (triple) statements. (5E)

Two years ago, I set out to test this very claim by creating an ad-hoc community — the OHS Launch Community — and by making the creation of a shared ontology one of our primary goals. I’ll describe that experience here, and will address the comments in more detail later. (For now, see Jay Fienberg’s blog entry, “Semantic web 2003: not unlike TRS-80 in the 1970’s.” Jay makes a point that I want to echo in a later post.) (5F)

“Ontologies?!” (5G)

I first got involved with Doug Engelbart‘s Open Hyperdocument System (OHS) project in April 2000. For the next six months, a small group of committed volunteers met weekly with Doug to spec out the project and develop strategy. (5H)

While some great things emerged from our efforts, we ultimately failed. There were a lot of reasons for that failure — I could write a book on this topic — but one of the most important reasons was that we never developed Shared Language. (5I)

We had all brought our own world-views to the project, and — more problematically — we used the same terms differently. We did not have to agree on a single world-view — on the contrary, that would have hurt the collaborative process. However, we did need to be aware of each other’s world-views, even if we disagreed with them, and we needed to develop a Shared Language that would enable us to communicate more effectively. (5J)

I think many people understand this intuitively. I was lucky enough to have it thrown in my face, thanks to the efforts of Jack Park and Howard Liu. At the time, Jack and Howard worked at Vertical Net with Leo Obrst, one of the authors of The Semantic Web. Howard, in fact, was an Ontologist. That was his actual job title! I had taken enough philosophy in college to know what an ontology was in that context, but somehow, I didn’t think that had any relevance to Howard’s job at Vertical Net. (5K)

At our meetings, Jack kept saying we needed to work out an ontology. Very few of us knew what he meant, and both Jack and Howard did a very poor job of explaining what an ontology was. I mention this not to dis Jack and Howard — both of whom I like and respect very much — but to make a point about the entire ontology community. In general, I’ve found that ontologists are very poor at explaining what an ontology is. This is somewhat ironic, given that ontologies are supposed to clarify meaning in ways that a simple glossary can not. (5L)

Doug himself made this same point in his usual ridiculously lucid manner. He often asked, “How does the ontology community use ontologies?” If ontologies were so crucial to effective collaboration, then surely the ontology community used ontologies when collaborating with each other. Sadly, nobody ever answered his question. (5M)

OHS Launch Community (5N)

At some point, something clicked. I finally understood what ontologies (in an information sciences context) were, and I realized that developing a shared ontology was an absolute prerequisite for collaboration to take place. Every successful communities of practice had developed a shared ontology, whether they were aware of it or not. (5O)

Not wanting our OHS efforts to fade into oblivion, I asked a subset of the volunteers to participate in a community experiment, which — at Doug’s suggestion — we called the OHS Launch Community. Our goal was not to develop the OHS. Our goal was to figure out what we all thought the OHS was. We would devote six-months towards this goal, and then decide what to do afterwards. My theory was that collectively creating an explicit ontology would be a tipping point in the collaborative process. Once we had an ontology, progress on the OHS would flow naturally. (5P)

My first recruits were Jack and Howard, and Howard agreed to be our ontology master. We had a real, live Ontologist as our ontology master! How could we fail?! (5Q)

Mixed Results (5R)

Howard suggested using Protege as our tool for developing an ontology. He argued that the group would find the rigor of a formally expressed ontology useful, and that we could subsequently use the ontology for developing more-intelligent search mechanisms into our knowledge repository. (5S)

We agreed. Howard and I then created a highly iterative process for developing the formal ontology. Howard would read papers and follow mailing list discussions carefully, construct the ontology, and post updated versions early and often. He would also use Protege on an overhead projector during face-to-face discussions, so that people could watch the ontology evolve in real-time. (5T)

Howard made enough progress to make things interesting. He developed some preliminary ontologies from some papers he had read, and he demonstrated and explained this work at one of our first meetings. Unfortunately, things suddenly got very busy for him, and he had to drop out of the group. (5U)

That was the end of the formal ontology part of the experiment, but not of the experiment itself. First, we helped ourselves by collectively agreeing that developing a shared ontology was a worthwhile goal. This, and picking an end-date, helped us eliminate some of the urgency and anxiety about “making progress.” Developing Shared Language can be a frustrating experience, and it was extremely valuable to have group buy-in about its importance up-front. (5V)

Second, we experimented with a facilitation technique called Dialogue Mapping. Despite my complete lack of experience with this technique (I had literally learned it from Jeff Conklin, its creator, the day before our first meeting), it turned out to be extremely useful. We organized a meeting called, “Ask Doug Anything,” which I facilitated and captured using a tool called Quest Map. It was essentially the Socratic Method in reverse. We asked questions, and Doug answered them. The only way we were allowed to challenge him or make points of our own was in the form of a question. (5W)

That meeting was a watershed for me, because I finally understood Doug’s definition of a Dynamic Knowledge Repository. (See the dialog map of that discussion.) One of the biggest mistakes people make when discussing Doug’s work is conflating Open Hyperdocument System with Dynamic Knowledge Repository. Most of us had made that mistake, which prevented us from communicating clearly with Doug, and from making progress overall. (5X)

Epilogue (5Y)

We ended the Launch Community on November 9, 2001, about five months after it launched. We never completed the ontology experiment to my satisfaction, but I definitely learned many things. We also accomplished many of our other goals. We wanted to be a bootstrapping community, Eating Our Own Dogfood, running a lot of experiments, and keeping records of our experiences. We also wanted to facilitate collaboration between our members, most of whom were tool developers. Among our many accomplishments were: (5Z)

First purpled mailing list archive, using mhpurple. (60)
Topic Maps experiments, many of them using Jack Park‘s Nexist tool. (61)
A Purple Numbers standard. (62)
Dialogue Mapping experiments, which motivated the creation of perlIBIS for exporting the dialog maps into a number of different formats. It also gave me several ideas on how to apply Dialogue Mapping (a same-time, same-place facilitation technique) towards asynchronous collaboration, a topic for another time. (63)

The experiment was successful enough for me to propose a refined version of the group as an official entity of the Bootstrap Alliance, called the OHS Working Group. The proposal was accepted, but sadly, got sidetracked. (Yet another story for another time.) In many ways, the Blue Oxen collaboratories are the successors to the OHS Working Group experiment. We’ve adopted many of the same principles, especially the importance of Shared Language and bootstrapping. (64)

I believe, more than ever, that developing shared ontology needs to be an explicit activity when collaborating in any domain. I’ll discuss where or whether Semantic Web technologies fit in, in a later post. (65)

Do We Need the Semantic Web?

Published August 3, 2003 by Eugene Eric Kim

The Semantic Web, by Michael DaConta, Leo Obrst, and Kevin Smith (Wiley 2003), is a good book. I’ve worked with Michael a bit in an editorial context, and I’ve enjoyed some of his other writing. He thinks and explains things clearly, and this book is no exception. I especially enjoyed how The Semantic Web‘s crisply defined a number of hairy concepts — ontologies, taxonomies, semantics, etc. With some restructuring and condensing — there is some technical detail that isn’t that important, and the sections on ontologies could be more cohesive and should come earlier — this book could go from good to great. (4V)

My goal here, however, is not to review The Semantic Web. My goal here is to complain about its premise. (4W)

The authors say that the Semantic Web is about making data smarter. If we expend some extra effort making our data machine-understandable, then machines can do a better job of helping us with that data. By “machine-understandable,” the authors mean making the machines understand the data the same way we humans do. However, the authors make a point early in the book of separating their claims from those of AI researchers in the 1960s and 1970s. They are not promising to make machines as smart as humans. They are claiming that we can exploit machine capabilities more fully, presumably so that machines can better augment human capabilities. (4X)

The authors believe that the Semantic Web will have an enormous positive effect on society, just as soon as it catches on. There’s the rub. It hasn’t. The question is why. (4Y)

The answer lies with two related questions: What’s the cost, and what’s the return? (4Z)

Consider the return first. Near the end of the book, the authors say: (50)

With the widespread development and adoption of ontologies, which explicitly represent domain and cross-domain knowledge, we will have enabled our information technology to move upward — if not a quantum leap, then at least a major step — toward having our machines interact with us at our human conceptual level, not forcing us human beings to interact at the machine level. We predict that the rise in productivity at exchanging meaning with our machines, rather than semantically uninterpreted data, will be no less revolutionary for information technology as a whole. (238) (51)

The key phrase above is, “having our machines interact with us at our human conceptual level, not forcing us human beings to interact at the machine level.” There are two problems with this conclusion. First, machines interacting with humans at a human conceptual level sounds awfully like artificial intelligence. Second, the latter part of this phrase contradicts the premise of the book. To make the Semantic Web happen, humans have to make their data “smarter” by interacting at the machine level. (52)

That leads to the cost question: How much effort is required to make data smarter? I suppose the answer to that question depends on how you read the book, it seems to require quite a bit. Put aside the difficulties with RDF syntax — those can be addressed with better tools. I’m concerned about the human problem of constructing semantic models. This is a hard problem, and tools aren’t going to solve it. Who’s going to be building ontologies? I don’t think regular folks will, and if I’m right, then that makes it very difficult to expect a network effect on the order of the World Wide Web. (53)

Human-Understandable Ontologies (54)

There were three paragraphs in the book that really struck me: (55)

Semantic interpretation is the mapping between some structured subset of data and a model of some set of objects in a domain with respect to the intended meaning of those objects and the relationships between those objects.    (56)

Typically, the model lies in the mind of the human. We as humans “understand” the semantics, which means we symbolically represent in some fashion the world, the objects of the world, and the relationships among those objects. We have the semantics of (some part of) the world in our minds; it is very structured and interpreted. When we view a textual document, we see symbols on a page and interpret those with respect to what they mean in our mental model; that is, we supply the semantics (meaning). If we wish to assist in the dissemination of the knowledge embedded in a document, we make that document available to other human beings, expecting that they will provide their own semantic interpreter (their mental models) and will make sense out of the symbols on the document pages. So, there is no knowledge in that document without someone or something interpreting the semantics of that document. Semantic interpretation makes knowledge out of otherwise meaningless symbols on a page.    (57)

If we wish, however, to have the computer assist in the dissemination of the knowledge embedded in a document — truly realize the Semantic Web — we need to at least partially automate the semantic interpretation process. We need to describe and represent in a computer-usable way a portion of our mental models about specific domains. Ontologies provide us with that capability. This is a large part of what the Semantic Web is all about. The software of the future (including intelligent agents, Web services, and so on) will be able to use the knowledge encoded in ontologies to at least partially understand, to semantically interpret, our Web documents and objects. (195-197)    (58)

To me, these paragraphs beautifully explain semantics and describe the motivation for the Semantic Web. I absolutely agree with what is being said and how. My concerns are with scope — the cost and benefit questions — and with priority. (59)

The Semantic Web is only important in so far as it helps humans with our problems. The problem that the Semantic Web is tackling is information overload. In order to tackle that problem, the Semantic Web has to solve the problem of getting machines to understand human semantics. This is related to the problem of getting humans to understand human semantics. To me, solving the problem of humans understanding each other is far more important than getting machines to understand humans. (5A)

Ontologies are crucial for solving both problems. Explicit awareness of ontologies helps humans communicate. Explicit expression of ontologies helps machines interpret humans. The difference between the two boils down, once again, to costs and returns. The latter costs much more, but the return does not seem to be proportionately greater. I think it would be significantly cheaper and more valuable to develop better ways of expressing human-understandable ontologies. (5B)

I’m not saying that the Semantic Web is a waste of time. Far from it. I think it’s a valuable pursuit, and I hope that we achieve what the authors claim we will achieve. Truth be told, my inner gearhead is totally taken by some of the work in this area. My concern is that our collective inner gearhead is causing us to lose sight of the original goal. To paraphrase Doug Engelbart, we’re trying to make machines smarter. How about trying to make humans smarter? (5C)

Santa Maria Steaks at The Hitching Post

Published July 28, 2003 by Eugene Eric Kim

About a month ago, my friend Justin mentioned a town near Santa Barbara, California that claimed to have the world’s best barbecue. As I explained a few weeks ago, I claim to be somewhat of an authority on barbecue, having eaten it outside of California. To be so near (well, about 250 miles away) yet so ignorant of a place claiming to be the cradle of barbecue civilization was somewhat of a shock to me. (4N)

I attempted to right that wrong yesterday at The Hitching Post, a steakhouse in Casmalia. Casmalia is a former mining town in the Santa Maria Valley, about 75 miles north of Santa Barbara. (4O)

Santa Maria barbecue has its roots with the Spanish ranchers who populated the region in the 1850s. To reward los vaqueros after a successful cattle herd, the ranchers would throw a feast consisting of top sirloin crusted with garlic salt and pepper and cooked slowly over a red oak fire, salsa, and pinquitos, a pinkish bean. Both the beans and the wood are native to Santa Maria. (4P)

As its boastful claim suggests, Santa Maria takes its meat seriously. My challenge was to find a restaurant that specialized in the local fare. The city’s Chamber of Commerce web site was somewhat unhelpful. I couldn’t find a place whose menu jumped out at me as the real deal. Justin suggested The Hitching Post, which seemed to have a good reputation and also produced its own label of wines. (4Q)

The steaks at The Hitching Post were excellent, the salsa was fresh, the servings were large, the wine (The Hitching Post Pinot Noir Santa Maria 2000) was good, and the price was reasonable. But, I wasn’t satisfied. I had a beef with their beef; namely, I don’t think The Hitching Post served true Santa Maria barbecue. (4R)

Most people think that barbecue is food cooked over a hot fire. That’s actually grilling. Barbecue is food cooked slowly over a cool fire. The process tenderizes the meat while imbuing it with a delicious, smoky flavor. It’s what makes barbecued ribs or pulled pork literally fall off the bone. (4S)

The Hitching Post served steaks, not barbecue. True, they used the correct cut of meat — top block sirloin. True, they rubbed it with garlic salt and pepper. True, they cooked it over a red oak fire. True, the steaks were delicious. But, it still wasn’t barbecue. The kicker was that they did not serve pinquitos. (4T)

I could only conclude that I did not experience the true Santa Maria dining experience. That wrong still needs to be righted. I suspect that next Sunday, I will once again find myself in Santa Maria, searching, hoping, eating. Stay tuned. (4U)