Wiki Analytics at the Wikithon

I got to put on my hacker hat for a day (a very rare occurrence for me these days) last Wednesday at the Wikithon. After trolling around for ideas, I decided to work on Wiki Analytics with Matthew O’Connor. We ended up dominating the competition and winning the contest for best hack. (So what if there were only two teams eligible for two prizes?)    (LRI)

https://i1.wp.com/farm1.static.flickr.com/150/386792217_6d63faa621_m.jpg?w=700    (LRJ)

Our driving question was: How can we measure the health of a Wiki? I don’t think there is one best way to use a Wiki, but there might only be three or four. If we can start teasing out patterns of Wiki usage, we can better understand how people collaborate with Wikis, which will help us better facilitate Wiki communities and improve Wiki software. Our goal was to tease out the patterns.    (LRK)

We used data from 266 public Socialtext workspaces and Socialtext‘s internal corporate workspace. You can read the details of our brainstorming and work on the Socialtext STOSS Wiki. Our approach was to simplify our tasks so that we could have something to show at the end of the day. It was decidedly practical, but it also reflected a deeper philosophy about Wiki Analytics. Start simple and evolve. You can learn interesting things from even simple measurements.    (LRL)

Results    (LRM)

We chose to focus on two types of analysis: page name and graph (link) analysis. I hacked on the former; Matthew on the latter.    (LRN)

Frequent followers of this blog have heard me say it before: Link As You Think is what makes Wikis powerful. The better your page names, the more interlinked your repository will be as you Link As You Think. In order to see if I could measure “good” page names, I looked at three things:    (LRO)

  • Length    (LRP)
  • Number of tokens (words)    (LRQ)
  • Number of non-alphanumeric characters    (LRR)

The hypotheses are straightforward. Shorter names are better. Names with fewer tokens (words) are better. Names without non-alphanumeric characters are better. (This last hypothesis is complicated by internationalization.)    (LRS)

You can read the results of my analysis. The workspaces on the index page are ordered largest to smallest. The top two workspaces are full of spam and can be safely ignored. The numbers on the index page are buggy; click through to the individual pages to see the correct numbers.    (LRT)

Matthew studied the graph characteristics of the Wikis, specifically:    (LRU)

  • Number of links (forward and back) versus number of pages    (LRV)
  • Number of islands (clusters of pages that are strongly connected to each other) and their sizes (number of pages on an island)    (LRW)

Islands of one are orphan pages (not linked to anywhere) and are undesirable. Large islands are better (or at least more interesting) than small ones.    (LRX)

You can view Matthew’s results on his site.    (LRY)

Analysis    (LRZ)

To give you an idea of what the stats mean, let’s look at four Wikis:    (LS0)

The mean number of characters and number of tokens for page names on each Wiki were:    (LS5)

  • 21.3 / 3.1 (stoss)    (LS6)
  • 18.6 / 2.4 (speakers)    (LS7)
  • 17.4 / 1.7 (st-rest-docs)    (LS8)
  • 39.3 / 6.7 (ivrwiki)    (LS9)

On the surface, the two Wikis in the middle — stoss and speakers — seem to have hit the sweet spot for page names: between two to three words per name. Since stoss is meant to be a collaborative workspace for a larger community, this seems to be a healthy number. The speakers Wiki is a repository of potential speakers. Since the majority of pages consists of people’s names, the numbers (two, sometimes three words in a page name) make sense.    (LSA)

The remaining two Wikis diverge enough from this minute data set that we can infer some different patterns of usage. st-rest-docs documents Socialtext‘s REST API, so there are a lot of one word page names representing method names. Even though the average number of tokens is smaller, the average name length is comparable to the two Wikis in the middle. This also makes sense, given that the methods in a REST API are actually URI paths, which can get somewhat long.    (LSB)

On the surface, ivrwiki seems to exhibit the classic signs of a newbie dumping ground, with page names that are too long to be useful. However, if you dig deeper, you can see that that’s not the case. The standard deviation of number of tokens is quite large (4.2), indicating a flat distribution curve. In other words, while there are a lot of long names, there are also a lot of short names. If you dig even further, you’ll see that the community is using the Wiki as a question repository, and questions naturally have lots of words. Additionally, there seems to be a lot of more traditionally “Wiki-like” behavior on that Wiki.    (LSC)

This was no accident. The reason I’m showcasing ivrwiki is that Matthew identified it as an “interesting” Wiki from his graph analysis. Look at the numbers. There are three sizes of islands: 19 of one page, one of 16 pages, and one of 353 pages! That’s one big island! It indicates a fairly tight set of linkages across the majority of the pages on a Wiki. Dig a bit deeper, and you can see the hub of the cluster: the Knowledge Base Index page. It links to every page in the knowledge base, and every page in the knowledge base links back to this page.    (LSD)

The st-rest-docs Wiki exhibits similar behavior — one big island of 81 pages. This makes sense, given that this Wiki represents documentation, which is structured in a similar way to the ivrwiki knowledge base.    (LSE)

The stoss Wiki is the most Wiki-like of the four when you dig into the graph analysis. There are five sizes of islands, the largest containing 10 pages. The distribution is fairly regular — based on my guess of what “regular” should be, at least. To really get a sense of what “regular” should be, we’ll need to identify several Wikis that we consider to be “Wiki-like,” and examine those numbers.    (LSF)

Finally, look at the numbers for the speaker Wiki. The numbers are in reverse of the other Wikis. There is basically no clustering; all of the pages consist of islands in and of themselves. At first glance, this is surprising. You would expect it to look somewhat like ivrwiki and st-rest-docs. The reason for the lack of clustering is that this Wiki relies on Socialtext‘s tagging interface for navigation. Tags could be treated as a type of link, but we don’t treat them that way in our analysis.    (LSG)

Thoughts    (LSH)

As with any simplified analysis, there are always caveats. A lot of them are specific to the Wiki implementation. For example, several people at Socialtext use the stoss Wiki as a blog, which creates long page names and thus skews the statistics. Other Wikis may be similar to the speakers Wiki in that they use tags as navigational links.    (LSI)

There’s an open question as to whether or not to consider a Wiki a directed graph or not. We chose the former, but you can make a good argument that the Socialtext Wiki acts as a non-directed graph, or at least a bidirectional one, because Backlinks are displayed on the page itself. The same holds true with any other Wiki depending on the navigational context. If I start at the home page and start navigating around, I can often use the browser back button to go back, or at worst, I can click on “Backlinks” to figure out the context.    (LSJ)

I’m not sure the page name analysis is that interesting by itself. I think it gets very interesting when applied to the specific islands on a Wiki. People may be using a Wiki in a number of different ways, as demonstrated by the ivrwiki. Analysis on each individual cluster will potentially surface the different kinds of behaviors on a Wiki, which is more appropriate than trying to slap on a single archetype if one does not exist.    (LSK)

Finally, what level of clustering is healthy? In systems theory, networks that are either too tightly clustered or too lightly clustered are problematic. With enough analysis, we may be able to speculate on the right number for Wikis.    (LSL)

Matthew and I will release our code at some point, and we’ll hopefully have some time to follow up on it as well. Specifically, I’d like to examine a lot of other Wikis, starting with the ones that Blue Oxen Associates hosts.    (LSM)

There were a lot of other hacks at the Wikithon that were cool. My favorites were Ingy dot Net‘s Social Zork (which was not only hilarious, but is actually potentially useful) and Shawn Devlin‘s Word Cloud, which I hope to use on other Wikis. Christine Herron wrote a good summary of the day’s festivities.    (LSN)

February 2007 Update

A month has passed, and the blog has been silent, but the brain has not. Time to start dumping again. But before I begin, a quick synopsis:    (LR8)

  • The month started off inauspiciously, with a catastrophic system failure that occurred over the holidays. Quite the story. I hope to tell it someday.    (LR9)
  • Last year, I joined the board of the Leadership Learning Community (LLC). It was an unusual move on my part, since I was also in the process of clearing commitments off my list in order to focus more on my higher-level goals. In the midst of saying no to many, many people, I found myself saying yes to LLC. We had our first 2007 board meeting earlier this month, and I participated in their subsequent learning circles. Let’s just say I have no regrets. A week with these folks generated enough thoughts to fill a thousand blog posts.    (LRA)
  • This past week, I co-facilitated a three day Lunar Dust Workshop for NASA, using Dialogue Mapping and Compendium. It was an unbelievable experience, also worth a thousand blog posts. For now, check out some pictures.    (LRB)
  • For the past few months, I’ve been actively involved with a project called Grantsfire. The project’s goal is modest: Make foundations and nonprofits more transparent and collaborative. How? For starters, by getting foundations to publish their grants as microformats. I’ve hinted about the project before, and I’ll have much more to say soon.    (LRC)
  • For the past year, I’ve been helping reinvent Identity Commons. Again, I haven’t blogged much about it, but I’ve certainly talked a lot about it. Not only are we playing an important role in the increasingly hot Internet identity space, we’re also embodying a lot of important ideas about facilitating networks and catalyzing collaboration.    (LRD)

In addition to a flood of blog posts, other things to look forward to this month include:    (LRE)

Kirsten Jones on Perlcast

Socialtext‘s Kirsten Jones talks about Socialtext Open and its REST API on this week’s Perlcast. It’s a good interview, and the last question reminded me of a funny exchange at WikiWednesday this past week:    (LH0)

Kirsten: PHP is for girls.    (LH1)

Evan Prodromou: Hey, Michele would object to that!    (LH2)

Kirsten: That’s because she’s a woman. PHP is for girls, Perl is for women.    (LH3)

WikiWednesday and Web Mondays

Two “days” coming up worth attending, for those of you in the Bay Area. Tomorrow is WikiWednesday at Socialtext in Palo Alto. Three good reasons to go:    (LGI)

Next Monday is the third WebMonday Silicon Valley, this time at Cooley Godward Kronish in Palo Alto. Sadly, I won’t be able to make this one, but I spoke at the last one, and I had an one excellent time.    (LGM)

BarBar Redux

Thanks to those of you who dropped by BarBar last night! Not surprisingly given that Scott McMullan and I organized, it was a very Wiki-oriented crowd: folks from JotSpot, Socialtext, and Atlassian were there to relax.    (LAT)

https://i2.wp.com/static.flickr.com/120/260460715_429ee26d12_m.jpg?w=700    (LAU)

If you want to know what makes Silicon Valley great, this picture says it all. Where else in the world is it commonplace for competitors to get together for beers after work and talk openly about their work and their lives? We had great conversation (not all of it Wiki-related), and I had the chance to preach WikiOhana to my enterprisey peers.    (LAV)

The highlight of my evening was enjoying the sweet fare of the Tamale Lady for the first time.    (LAW)

https://i2.wp.com/static.flickr.com/94/260460413_25cb3c9a58_m.jpg?w=700    (LAX)

They were ridiculously tasty. How is it that I’ve lived in the Bay Area for over ten years, and I had never heard of the Tamale Lady before? Ah well, now I’m in the know (and so are you).    (LAY)