Why Are We Afraid of Data?

My friend, Gbenga Ajilore, is an economics professor. Last month, he gave a great talk at AlterConf in Chicago entitled, “How can open data help facilitate police reform?” It concisely explains how data helps us overcome anecdotal bias.

I was particularly struck by his point about how we need police buy-in for this data to be truly useful, and I was left with a bit of despair. Why is buy-in about the importance of data so hard? This should be common sense, right?

Clearly, it’s not. Earlier this year, I expressed some disbelief about how, in professional sports, where there are hundreds of millions of dollars riding on outcomes, there is still strong resistance to data and analytics.

On the one hand, it’s incredible that this is still an issue in professional sports, 14 years after Moneyball was first published and several championships were won by analytics-driven franchises (including two “cursed” franchises, the Boston Red Sox and the Chicago Cubs, both led by data nerd Theo Epstein).

On the other hand, it’s a vivid reminder of how hard habits and groupthink are to break, even in a field where the incentives to be smarter than everyone else come in the form of hundreds of millions of dollars. If it’s this hard to shift mindsets in professional sports, I don’t even want to imagine how long it might take in journalism. It’s definitely helping me recalibrate my perspective about the mindsets I’m trying to shift in my own field.

The first time I started to understand the many social forces that cause us to resist data was right after college, when I worked as an editor at a technology magazine. One of my most memorable meetings was with a vendor that made a tool that analyzed source code to surface bugs. All software developers know that debugging is incredibly hard and time-consuming. Their tool easily and automatically identified tons and tons of bugs, just by feeding it your source code.

“This is one of the best demos I’ve ever seen!” I exclaimed to the vendor reps. “Why isn’t everyone knocking on your door to buy this?”

The two glanced at each other, then shrugged their shoulders. “Actually,” one explained, “we are having a lot of trouble selling this. When people see this demo, they are horrified, because they realize how buggy their code is, and they don’t have the time or resources to fix it. They would rather that nobody know.”

The Not-So-Mystifying Power of Groupthink and Habits

My friend, Greg, recently sent me this excellent and troubling Nate Silver article, “There Really Was A Liberal Media Bubble,” on the 2016 presidential election. Silver references James Surowiecki’s book, The Wisdom of Crowds and suggests that political journalists fail the first three of Surowiecki’s four conditions for wise crowds.

  1. Diversity of opinion (fail)
  2. Independence (fail)
  3. Decentralization (fail)
  4. Aggregation (succeed)

Many of Silver’s points hit very close to home, not just because I believe very strongly in the importance of a strong, independent media, but because improving our collective wisdom is my business, and many fields I work with — including my own — suffer from these exact same problems.

On diversity, for example, Silver points out that newsrooms are not only not diverse along race, gender, or political lines, but in how people think:

Although it’s harder to measure, I’d also argue that there’s a lack of diversity when it comes to skill sets and methods of thinking in political journalism. Publications such as Buzzfeed or (the now defunct) Gawker.com get a lot of shade from traditional journalists when they do things that challenge conventional journalistic paradigms. But a lot of traditional journalistic practices are done by rote or out of habit, such as routinely granting anonymity to staffers to discuss campaign strategy even when there isn’t much journalistic merit in it. Meanwhile, speaking from personal experience, I’ve found the reception of “data journalists” by traditional journalists to be unfriendly, although there have been exceptions.

On independence, Silver describes how the way journalism is practiced — particularly in this social media age — ends up acting as a massive echo chamber:

Crowds can be wise when people do a lot of thinking for themselves before coming together to exchange their views. But since at least the days of “The Boys on the Bus,” political journalism has suffered from a pack mentality. Events such as conventions and debates literally gather thousands of journalists together in the same room; attend one of these events, and you can almost smell the conventional wisdom being manufactured in real time. (Consider how a consensus formed that Romney won the first debate in 2012 when it had barely even started, for instance.) Social media — Twitter in particular — can amplify these information cascades, with a single tweet receiving hundreds of thousands of impressions and shaping the way entire issues are framed. As a result, it can be largely arbitrary which storylines gain traction and which ones don’t. What seems like a multiplicity of perspectives might just be one or two, duplicated many times over.

Of the three conditions where political journalism falls short, Silver thinks that independence may be the best starting point for improvement:

In some ways the best hope for a short-term fix might come from an attitudinal adjustment: Journalists should recalibrate themselves to be more skeptical of the consensus of their peers. That’s because a position that seems to have deep backing from the evidence may really just be a reflection from the echo chamber. You should be looking toward how much evidence there is for a particular position as opposed to how many people hold that position: Having 20 independent pieces of evidence that mostly point in the same direction might indeed reflect a powerful consensus, while having 20 like-minded people citing the same warmed-over evidence is much less powerful. Obviously this can be taken too far and in most fields, it’s foolish (and annoying) to constantly doubt the market or consensus view. But in a case like politics where the conventional wisdom can congeal so quickly — and yet has so often been wrong — a certain amount of contrarianism can go a long way.

Maybe he’s right. All I know is that “attitudinal adjustments” — shifting mindsets — is really hard. I was reminded of this by this article about Paul DePodesta and the ongoing challenge to get professional sports teams to take data seriously.

Basis of what DePodesta and Browns are attempting not new. Majority of NFL teams begrudgingly use analytics without fully embracing concept. Besides scouting and drafting, teams employ analytics to weigh trades, allot practice time, call plays (example: evolving mindset regarding fourth downs) and manage clock. What will differentiate DePodesta and Cleveland is extent to which Browns use data science to influence decision-making. DePodesta would like decisions to be informed by 60 percent data, 40 percent scouting. Present-day NFL is more 70 percent scouting and 30 percent data. DePodesta won’t just ponder scouts’ performance but question their very existence. Will likewise flip burden of proof on all football processes, models and systems. Objective data regarding, say, a player’s size and his performance metrics — example: Defensive ends must have arm length of at least 33 inches — will dictate decision-making. Football staff will then have to produce overwhelming subjective argument to overrule or disprove analytics. “It’s usually the other way around,” states member of AFC team’s analytics staff.

On the one hand, it’s incredible that this is still an issue in professional sports, 14 years after Moneyball was first published and several championships were won by analytics-driven franchises (including two “cursed” franchises, the Boston Red Sox and the Chicago Cubs, both led by data nerd Theo Epstein).

On the other hand, it’s a vivid reminder of how hard habits and groupthink are to break, even in a field where the incentives to be smarter than everyone else come in the form of hundreds of millions of dollars. If it’s this hard to shift mindsets in professional sports, I don’t even want to imagine how long it might take in journalism. It’s definitely helping me recalibrate my perspective about the mindsets I’m trying to shift in my own field.

The Mainstreaming of Analytics

John Hollinger, a long-time ESPN.com columnist and inventor of the Player Efficiency Rating (PER) for evaluating basketball players, is joining the Memphis Grizzlies front office as its Vice President of Basketball Operations.

This is wacky on a number of levels. First, it represents the ongoing rise of the numbers geek in sports, a movement pioneered by Bill James almost 40 years ago, given an identity a decade ago in Michael Lewis’s book, Moneyball, and gaining official acceptance in the NBA five years ago, when the Houston Rockets named Daryl Morey its General Manager. Want to run a professional sports team? These days, an MIT degree seems to give you a better chance than spending years in the business.

Second, Hollinger spent over a decade sharing his thinking and his tools for all to see. Now, all his competition needs to do to understand his thinking is to Google him. Tom Ziller writes:

The major difference between Hollinger and, say, Morey is that we all know Hollinger’s theories. We know his positions, and we’ve learned from his work…. Will his canon hurt his ability to make moves? We can lay out exactly which players he likes based on his public formulas and his writings. Other GMs will know which Memphis players he’ll sell low on. You can anticipate his draft choices if you’re picking behind him. If you’ve got a high-production, low-minutes undersized power forward, you know you can goose the price on him because history indicates that Hollinger values him quite seriously.

This is all a gross simplification: Hollinger’s oeuvre is filled with nuance. He doesn’t rank players solely by PER, and in fact he probably has some adjustments to his myriad metrics up his sleeve. He’s not going to be nearly as predictable as a decision-maker as anyone would be as a writer. The stakes are different, the realities of action are different. But no decision-maker in the NBA has had this much of their brain exposed to the world. Morey isn’t shy, but that big Michael Lewis spread on Shane Battier was as far as we ever got into the GM’s gears. Zarren is notoriously careful about what he says. He might be the only GM or assistant GM in the league more secretive than Petrie.

It’s interesting to consider the implications on the Big Data movement in business (on which Moneyball had a much greater influence than most would probably admit). Business is not a zero sum game like professional sports, so there’s more room for nuance and many positive examples of openness and transparency. Still, for all those who believe that openness and competition do not have to be at odds with each other, this will be fascinating to watch.

Ziller also makes a wonderful point about the importance of communicating meaning from analysis:

In the end, what Hollinger’s hire means is that the ability to do the hard analysis is important, but so is translating that to a language the people on the court can understand. That’s always been a wonderful Hollinger strength: making quant analysis accessible without dumbing it down. Even someone as brilliant as Morey, who has a team of quants, can’t always achieve that.

I’m reminded of a tale from Rick Adelman’s days in Houston. Morey’s team would deliver lengthy scouting reports to the team and coaching staff well before a game. It’d have player tendencies, shooting charts, instructions on match-up advantages — everything you could ask for to prep for a game. And out of all of the coaches and all of the players only two — Shane Battier and Chuck Hayes — would devour the reports. The rest (Adelman included) would leaf through, pretend to care and go play ball. That story might be an exaggeration on the part of the person who told it, but even if that’s the case, it shows how important accessibility is. You can build the world’s greatest performance model. And if you can’t explain what it means to the people using it, it’s worthless.