OHS Launch Community: Experimenting with Ontologies

My review of The Semantic Web resulted in some very interesting comments. In particular, Danny Ayers challenged my point about focusing on human-understandable ontologies rather than machine-understandable ones:    (5D)

But…”I think it would be significantly cheaper and more valuable to develop better ways of expressing human-understandable ontologies”. I agree with your underlying point here, but think it’s just the kind of the Semantic Web technologies can help with. The model used is basically very human-friendly – just saying stuff about things, using (triple) statements.    (5E)

Two years ago, I set out to test this very claim by creating an ad-hoc community — the OHS Launch Community — and by making the creation of a shared ontology one of our primary goals. I’ll describe that experience here, and will address the comments in more detail later. (For now, see Jay Fienberg’s blog entry, “Semantic web 2003: not unlike TRS-80 in the 1970’s.” Jay makes a point that I want to echo in a later post.)    (5F)

“Ontologies?!”    (5G)

I first got involved with Doug Engelbart‘s Open Hyperdocument System (OHS) project in April 2000. For the next six months, a small group of committed volunteers met weekly with Doug to spec out the project and develop strategy.    (5H)

While some great things emerged from our efforts, we ultimately failed. There were a lot of reasons for that failure — I could write a book on this topic — but one of the most important reasons was that we never developed Shared Language.    (5I)

We had all brought our own world-views to the project, and — more problematically — we used the same terms differently. We did not have to agree on a single world-view — on the contrary, that would have hurt the collaborative process. However, we did need to be aware of each other’s world-views, even if we disagreed with them, and we needed to develop a Shared Language that would enable us to communicate more effectively.    (5J)

I think many people understand this intuitively. I was lucky enough to have it thrown in my face, thanks to the efforts of Jack Park and Howard Liu. At the time, Jack and Howard worked at Vertical Net with Leo Obrst, one of the authors of The Semantic Web. Howard, in fact, was an Ontologist. That was his actual job title! I had taken enough philosophy in college to know what an ontology was in that context, but somehow, I didn’t think that had any relevance to Howard’s job at Vertical Net.    (5K)

At our meetings, Jack kept saying we needed to work out an ontology. Very few of us knew what he meant, and both Jack and Howard did a very poor job of explaining what an ontology was. I mention this not to dis Jack and Howard — both of whom I like and respect very much — but to make a point about the entire ontology community. In general, I’ve found that ontologists are very poor at explaining what an ontology is. This is somewhat ironic, given that ontologies are supposed to clarify meaning in ways that a simple glossary can not.    (5L)

Doug himself made this same point in his usual ridiculously lucid manner. He often asked, “How does the ontology community use ontologies?” If ontologies were so crucial to effective collaboration, then surely the ontology community used ontologies when collaborating with each other. Sadly, nobody ever answered his question.    (5M)

OHS Launch Community    (5N)

At some point, something clicked. I finally understood what ontologies (in an information sciences context) were, and I realized that developing a shared ontology was an absolute prerequisite for collaboration to take place. Every successful communities of practice had developed a shared ontology, whether they were aware of it or not.    (5O)

Not wanting our OHS efforts to fade into oblivion, I asked a subset of the volunteers to participate in a community experiment, which — at Doug’s suggestion — we called the OHS Launch Community. Our goal was not to develop the OHS. Our goal was to figure out what we all thought the OHS was. We would devote six-months towards this goal, and then decide what to do afterwards. My theory was that collectively creating an explicit ontology would be a tipping point in the collaborative process. Once we had an ontology, progress on the OHS would flow naturally.    (5P)

My first recruits were Jack and Howard, and Howard agreed to be our ontology master. We had a real, live Ontologist as our ontology master! How could we fail?!    (5Q)

Mixed Results    (5R)

Howard suggested using Protege as our tool for developing an ontology. He argued that the group would find the rigor of a formally expressed ontology useful, and that we could subsequently use the ontology for developing more-intelligent search mechanisms into our knowledge repository.    (5S)

We agreed. Howard and I then created a highly iterative process for developing the formal ontology. Howard would read papers and follow mailing list discussions carefully, construct the ontology, and post updated versions early and often. He would also use Protege on an overhead projector during face-to-face discussions, so that people could watch the ontology evolve in real-time.    (5T)

Howard made enough progress to make things interesting. He developed some preliminary ontologies from some papers he had read, and he demonstrated and explained this work at one of our first meetings. Unfortunately, things suddenly got very busy for him, and he had to drop out of the group.    (5U)

That was the end of the formal ontology part of the experiment, but not of the experiment itself. First, we helped ourselves by collectively agreeing that developing a shared ontology was a worthwhile goal. This, and picking an end-date, helped us eliminate some of the urgency and anxiety about “making progress.” Developing Shared Language can be a frustrating experience, and it was extremely valuable to have group buy-in about its importance up-front.    (5V)

Second, we experimented with a facilitation technique called Dialogue Mapping. Despite my complete lack of experience with this technique (I had literally learned it from Jeff Conklin, its creator, the day before our first meeting), it turned out to be extremely useful. We organized a meeting called, “Ask Doug Anything,” which I facilitated and captured using a tool called Quest Map. It was essentially the Socratic Method in reverse. We asked questions, and Doug answered them. The only way we were allowed to challenge him or make points of our own was in the form of a question.    (5W)

That meeting was a watershed for me, because I finally understood Doug’s definition of a Dynamic Knowledge Repository. (See the dialog map of that discussion.) One of the biggest mistakes people make when discussing Doug’s work is conflating Open Hyperdocument System with Dynamic Knowledge Repository. Most of us had made that mistake, which prevented us from communicating clearly with Doug, and from making progress overall.    (5X)

Epilogue    (5Y)

We ended the Launch Community on November 9, 2001, about five months after it launched. We never completed the ontology experiment to my satisfaction, but I definitely learned many things. We also accomplished many of our other goals. We wanted to be a bootstrapping community, Eating Our Own Dogfood, running a lot of experiments, and keeping records of our experiences. We also wanted to facilitate collaboration between our members, most of whom were tool developers. Among our many accomplishments were:    (5Z)

The experiment was successful enough for me to propose a refined version of the group as an official entity of the Bootstrap Alliance, called the OHS Working Group. The proposal was accepted, but sadly, got sidetracked. (Yet another story for another time.) In many ways, the Blue Oxen collaboratories are the successors to the OHS Working Group experiment. We’ve adopted many of the same principles, especially the importance of Shared Language and bootstrapping.    (64)

I believe, more than ever, that developing shared ontology needs to be an explicit activity when collaborating in any domain. I’ll discuss where or whether Semantic Web technologies fit in, in a later post.    (65)

Moneyball

I had many reasons for reading Michael Lewis’s latest book, Moneyball: The Art of Winning an Unfair Game. The book is about baseball, which I love, and more specifically, about the Oakland A’s recent run of success. I’ve been living in the Bay Area for about seven years now, and have adopted the A’s as my American League team. (Becoming a Giants‘ fan was not an option, as I remain loyal to my hometown Dodgers.) The book also talks a lot about Paul De Podesta, a fellow alumnus who graduated one year before me.    (1K)

After finishing the book, I discovered another reason for reading Moneyball: It offers insight that’s relevant to my interest in understanding collaboration and a compelling case study of one particular community.    (1L)

Metrics    (1M)

Moneyball is about how economics have changed the game of baseball. Up until the late 1990s, evaluating ballplayers had been an exercise in hand-waving, gut instincts, and misleading measurements of things like speed, strength, and even the structure of a player’s face. These practices had long been institutionalized, and there was little incentive to change.    (1N)

Market forces created that incentive. First, player salaries skyrocketed. It was more acceptable to risk $10,000 on a prospect than it was to risk $10 million. Second, market imbalances created a league of haves and have-nots, where the richest teams could spend triple the amount on salaries than the poorest. For small market teams — like the A’s — to compete, they had no choice but to identify undervalued (i.e. cheap), overachieving players.    (1O)

This new need led baseball executives like Oakland’s Billy Beane to the work of Bill James, a longtime baseball writer with a cult following. James castigated the professional baseball community for its fundamental lack of understanding of the game, and set about to reform the way it was measured. One problem was an overreliance on subjective observation. James wrote:    (1P)

Think about it. One absolutely cannot tell, by watching, the difference between a .300 hitter and a .275 hitter. The difference is one hit every two weeks. It might be that a reporter, seeing every game that the team plays, could sense that difference over the course of the year if no records were kept, but I doubt it. Certainly, the average fan, seeing perhaps a tenth of the team’s games, could never gauge two performances that accurately — in fact if you see both 15 games a year, there is a 40% chance that the .275 hitter will have more hits than the .300 hitter in the games that you see. The difference between a good hitter and an average hitter is simply not visible — it is a matter of record.    (1Q)

James was not simply a lover of numbers. Baseball was already replete with those. He was an advocate for numbers with meaning. The game had far less of these. Errors, for example, are a subjective measure, determined by a statistician watching the game, of whether or not a player should have made a defensive play. They do not account for a player’s ability to get in position to make a play in the first place. As a result, errors are a misleading measure of defensive ability.    (1R)

Lewis makes an important point about James’s work:    (1S)

James’s first proper essay was the preview to an astonishing literary career. There was but one question he left unasked, and it vibrated between his lines: if gross miscalculations of a person’s value could occur on a baseball field, before a live audience of thirty thousand, and a television audience of millions more, what did that say about the measurement of performance in other lines of work? If professional baseball players could be over- or under-valued, who couldn’t? Bad as they may have been, the statistics used to evaluate baseball players were probably far more accurate than anything used to measure the value of people who didn’t play baseball for a living.    (1T)

Extending this line of thinking further, how do we measure the effectiveness of collaboration? If we can’t measure this accurately, then how do we know if we’re getting better or worse at it? Baseball has the advantage of having well-defined rules and objectives. The same does not hold with most other areas, including collaboration. Is it even possible to measure anything in these areas in a meaningful way?    (1U)

The Sabermetrics Community    (1V)

James called his search for objective measures in baseball “sabermetrics,” named after the Society for American Baseball Research. Out of his work emerged a community of hard-core fans dedicated to studying sabermetrics. Lewis writes:    (1W)

James’s literary powers combined with his willingness to answer his mail to create a movement. Research scientists at big companies, university professors of physics and economics and life sciences, professional statisticians, Wall Street analysts, bored lawyers, math wizards unable to hold down regular jobs — all these people were soon mailing James their ideas, criticisms, models, and questions.    (1X)

…    (1Y)

Four years into his experiment James was still self-publishing his Baseball Abstract but he was overwhelmed by reader mail. What began as an internal monologue became, first, a discussion among dozens of resourceful people, and then, finally, a series of arguments in which fools were not tolerated.    (1Z)

…    (20)

The swelling crowd of disciples and correspondents made James’s movement more potent in a couple of ways. One was that it now had a form of peer review: by the early 1980s all the statistical work was being vetted by people, unlike James, who had a deep interest in, and understanding of, statistical theory. Baseball studies, previously an eccentric hobby, became formalized along the lines of an academic discipline. In one way it was an even more effective instrument of progress: all these exquisitely trained, brilliantly successful scientists and mathematicians were working for love, not money. And for a certain kind of hyperkinetic analytical, usually male mind there was no greater pleasure than searching for new truths about baseball.    (21)

Bill James had three important effects on the community. First, his writing created a Shared Language that enabled the community to grow and thrive. Second, he Led By Example, which gave him added credibility and created a constant flow of ideas and activity. Third, he answered his mail, and in such a way, transformed his monologue into a community dialogue.    (22)

The story of the sabermetrics community’s constant battle with baseball insiders provides a lesson on the dissemination of new ideas. Lewis describes how Major League Baseball largely ignored sabermetricians, even in the face of indisputable evidence, and explains how even today, the impediments continue. Sadly, the forces of institutionalization have prevented progress in many organizations and communities, not just baseball.    (23)

A Brief Review    (24)

Moneyball isn’t just a thought-provoking treatise on the objective nature of human affairs. At its core, it’s simply a great baseball book. Lewis is an excellent story-teller, and his retelling of interactions with players such as David Justice, Scott Hatteberg, and Chad Bradford make you feel like you’re in the clubhouse. More than anything, Moneyball is a fascinating portrayal of Billy Beane, the can’t-miss prospect who missed, and who then turned the Oakland A’s into baseball’s biggest success story.    (25)