Folksonomy Taxonomy Philosophy

I love playing The Book of Questions types of games with friends and colleagues, but when it comes to answering those types of questions myself, I’m a terrible waffler. When I play these games with my friend, Steph, she often complains scornfully, “You’re such a ‘P’.” “P” refers to the “Perceiving” Myers-Briggs personality type, which refers to folks who are highly context-sensitive (also known as “wafflers”).    (LNM)

Suffice it to say, I hate truisms (except for that one). You could even call me a “philosophical relativist,” which according to Elaine Peterson, would make me a fan of folksonomies. Also true. And in a metaphysical twist that will drive the less philosophically-inclined (and Steph) crazy, if you were to ask me if folksonomies were better than taxonomies, I would respond, “That’s not a valid question.” Folksonomies and taxonomies are not quite apples and oranges, but they’re not apples and apples either. Debating the two is intellectually interesting, but it obscures the real opportunity, which is understanding how the two could potentially augment each other.    (LNN)

The impetus for this little outburst is Gavin Clabaugh‘s recent piece on folksonomies. Gavin (who cites Peterson’s essay) argues that taxonomies are better for finding information than folksonomies. Do I agree with that? It depends. Clay Shirky outlined some situations when taxonomies are better for search and vice-versa in his excellent essay, “Ontology is Overrated: Categories, Links, and Tags”.    (LNO)

What troubles me about the claim at all is that it highlights a distinction that I find to be misleading. In Elaine Peterson‘s essay, “Beneath the Metadata: Some Philosophical Problems with Folksonomy,” the main problem she cites has to do with philosophical relativism. Folksonomies allow it; traditional classification does not.    (LNP)

What is philosophical relativism? If I show you a picture of a mono-colored object, is it possible for that object to be both black and white? If you answered yes, you’re a philosophical relativist.    (LNQ)

On the surface, “philosophical relativist” might sound like another term for “dumb as hell.” But, what if the picture was of a person? And what if that person had an African-American father and a Caucasian mother? Now is it possible to classify this photo as both “black” and “white”?    (LNR)

Language is highly context-sensitive. Philosophical relativists acknowledge this. Believe it or not, so do librarians and traditional taxonomists. A taxonomy attempts to make classification more useful by restricting the scope to a single context. If you happen to be operating within that context, then this works great. There are plenty of situations when this is the case (Gavin cites the medical community, which is a great example), but there are also plenty of situations when it’s not.    (LNS)

Folksonomies allow for multiple contexts, but that does not make them inherently less useful than taxonomies. As Clay points out in his essay, in practice, there’s a long tail of tags applied to different concepts. If something is tagged “black” by 98 people and “white” by two, you can be pretty sure that the object in question is “black.” Scale essentially transforms a folksonomy into a taxonomy with a little bit of noise that can easily be filtered out (if desired).    (LNT)

Frankly, I think the concern is less about whether taxonomies are inherently better than folksonomies and more about whether so-called experts should have a role in constructing taxonomies. Gavin also alludes to this, when he describes a conversation with two friends in a San Francisco coffee shop. (I don’t want to out those friends, but I will say that one of them runs a company named after the faithful companion of a certain oversized lumberjack from American folklore. I will also say that Gavin is an outstanding tea companion, and that we’re working on a project that has very little to do with folksonomies, but that will make the world a much better place regardless.)    (LNU)

Gavin’s friends suggested that folksonomies were a great way of collaboratively developing a taxonomy. Gavin partially agreed, but expressed some doubt, stating:    (LNV)

Rather than the wisdom of a crowd, I’d recommend the wisdom of a few experts within that crowd. In the end you’d end up with a more accurate and useful taxonomy, with half of the wasted bandwidth, and in probably a tenth of the time.    (LNW)

I can actually think of many situations where I would agree with this. One is Pandora, the music recommendation service built on top of the Music Genome Project. The Music Genome Project is a formal ontology for classifying music developed by 50 musician-analysts over seven years. By all accounts, the service is extraordinarily good. Chris Allen sang its praises to me at the last WikiWednesday, and it was all the rage at the original Bar Camp.    (LNX)

But having experts involved doesn’t preclude using a folksonomy to develop a taxonomy. Is a folksonomy developed by a small group of experts any less of a folksonomy?    (LNY)

In 2002, Kay-Yut Chen, Leslie Fine, and Bernardo Huberman developed a prediction market using Wisdom of Crowds techniques for financial forecasting of a division of HP. The market was 40 percent more accurate than the company’s official forecast. The catch? The people playing the market were the same people doing the official forecast. The difference was not in who was doing the predicting; the difference was in the process.    (LNZ)

I’m a historian by background. I have a great appreciation for the lessons of the past, which is reflected in my patterns-based approach towards improving collaboration. Five years ago, I reviewed Elaine Svenonius‘s wonderful book, The Intellectual Foundation of Information Organization, where I wrote:    (LO0)

Fortunately, a small segment of our population, librarians, has been dealing with the problem of information organization since 2000 B.C. Who better to turn to in our time of need than people with thousands of years of accumulated expertise and experience?    (LO1)

There is a tremendous amount of past knowledge that I’m afraid is being passed off as trite and irrelevant, when in fact it is even more relevant today. How many people building tagging systems know about Faceted Classification? How many of these developers know of Doug Lenat‘s brilliant research on Cyc, or that a huge subset of the Cyc ontology is open source? On the flip side, how many librarians and ontologists are needlessly dismissing folksonomies as not as good, and hence irrelevant?    (LO2)

Philosophical debates over taxonomy and folksonomy are exactly that: philosophy. I love philosophy. I enjoyed Peterson’s essay, and I’d recommend it to others. Curiously enough, David Weinberger, one of folksonomy’s foremost evangelists, is also a philosopher by background. (Read his response to Peterson’s essay.)    (LO3)

However, philosophy sometimes obscures reality, or worse yet, opportunity. We should be focusing our efforts on understanding how taxonomies and folksonomies can augment each other, not on picking sides.    (LO4)

Huberman on Communities of Practices and Forecasting

Bernardo Huberman, the director of the Information Dynamics Lab at Hewlett-Packard, gave a talk at Stanford on January 8 entitled, “Information Dynamics in the Networked World.” Huberman covered two somewhat disparate topics: Automatically discovering Communities Of Practice by analyzing email Social Networks, and a general forecasting technique based on markets.    (TX)

The first part of his talk was a summary of his recent papers. I’ve alluded to this work here many times, mostly in reference to Josh Tyler and SHOCK. The premise of the work is similar to that posed by David Gilmour in his recent Harvard Business Review article.    (TY)

Huberman described a technique for discovering clusters of social networks within an organization by analyzing email. The key innovation is an algorithm for discovering these clusters in linear time. The algorithm, inspired by circuit analysis, treats edges between nodes as resistors. By solving Kirchoff’s equations (which gives “voltage” values for each node), the algorithm determines to which cluster a node belongs.    (TZ)

The second part of Huberman’s talk was enthralling: Predicting the near-term future using markets. This is not a new idea. A very topical example of a similar effort is the Iowa Electronic Markets for predicting the outcome of presidential elections.    (U0)

The methodology Huberman described (developed by Kay-Yut Chen and, Leslie Fine) aggregates the predictions of a small group of individuals. It works in two stages. The first provides behavioral information about the participants, specifically their risk aversion. Huberman remarked that risk aversion is like a fingerprint; an individual’s level of risk aversion is generally constant. In the second stage, participants make predictions using small amounts of real money. The bets are anonymous. Those predictions are adjusted based on risk aversion levels, then aggregated.    (U1)

Huberman’s group set up a toy experiment to test the methodology. There were several marbles in an urn, and each marble was one of ten different colors. Participants were allowed to observe random draws from the urn, and then were asked to bet on the likelihood that a certain color would be drawn. In other words, they were guessing the breakdown of colors in the urn.    (U2)

Although some individuals did a good job of figuring out the general shape of the distribution curve, none came close to predicting the actual distribution. However, the aggregated prediction was almost perfect.    (U3)

The group then tried the methodology on a real-life scenario: predicting HP’s quarterly revenues. They identified a set of managers who were involved in the official forecast, and asked them to make bets. The official forecast was significantly higher than the actual numbers for that quarter. The aggregated prediction, however, was right on target. Huberman noted that the anonymity of the bets was probably the reason for the discrepancy.    (U4)

The discussion afterwards was lively. Not surprisingly, someone asked about the Policy Analysis Market, the much maligned (and subsequently axed) brainchild of John Poindexter’s Information Awareness Office. Huberman responded that the proposal was flawed, not surprisingly suggesting that this technique would have been the right way to implement the market. It was also poorly framed and, because of the vagaries of politics, will most likely never be considered in any form for the foreseeable future.    (U5)

I see many applications for this technique, and I wonder whether the Institute for the Future and similar organizations have explored it.    (U6)

As an aside, I love how multidisciplinary the work at HP’s Information Discovery Lab is. I’ve been following the lab for about a year now, and am consistently impressed by the quality and scope of the research there.    (U7)