My friend, Yangsze Choo, recently came out with her third book, The Fox Wife. It’s a murder mystery set in early 20th century northern China, and it’s got some mystical elements as well. It’s entertaining and immersive, and it’s been racking up awards.
Last month, she gave a talk in San Francisco about the book, and someone in the audience asked about her writing process. She explained that there are two kinds of writers: Those who outline, and those who just write. She is apparently one of the latter.
I am astounded by folks who write novel-length works this way. Her revelation reminded me of something I read 30 years ago about Victor Hugo and his thousand page plus classic, Les Misérables. Victor Hugo was normally a consummate reviser, except for when he wrote Les Misérables. He was so passionate about the political statement he was making, he ended up writing the massive tome cover-to-cover over the course of 20 years. This feat seemed so extraordinary to me that I’ve remembered it clearly for three decades and have thought about it many times.
Too bad I remembered this incorrectly.
Yangsze’s talk and my (what-I-thought-was-correct) memory of what Victor Hugo had done had inspired me to blog about a tension I often see in my work between planning and “going with the flow.” Under normal circumstances, I might have just mentioned the connection and let my thoughts flow from there without doing any additional work. However, I’m generally anal about sourcing, and I’ve also found writing difficult recently, so I decided to see if I could find my original source.
First, I searched the Internet. Nothing, not even a different source repeating the claim. I thought for a moment about where I could have read this. It was definitely in high school, and I didn’t have access to exotic sources back in the day, so it had to be something relatively accessible. Then I pounded my forehead. Of course! It was in the foreword of my copy of Les Misérables!
Fortunately, I still have my original tattered copy on my bookshelf, so I picked it up and started re-reading the foreword, which was written by Lee Fahnestock, one of the translators. According to Fahnestock, Hugo started writing this novel in 1845, then stopped after three years, only to pick it up again a dozen years later.
In 1860 he finally returned to Les Misérables, the book he had never expected to complete, and wrote through to the end. Then, in a move quite uncharacteristic of this writer who preferred to move forward rather than revise, he went back to insert many sections that brought the book into line with his liberalized views and perspectives gained offshore.
I’m not sure if I mis-remembered or mis-read this. Most likely the latter.
I’m realizing that I’m quite fond of reading the front-matter in books. Maybe it’s because, upon actually completing the book, writers understand more clearly what they want to say. Maybe it’s because I start many more books than I actually finish. In any case, I recently started reading Marc Hamer’s, How to Catch a Mole: Wisdom from a Life Lived in Nature, who writes in his Prologue:
I wonder about truth and what it is as I chase it around and play with it. Recollections rarely come in chronological order. Memory wanders in the darkness, and the harder I try to remember, the more it seems to dissolve in front of me and take a different direction. As soon as I start to examine a story with anything more intense than a sidelong glance, it shifts in reaction to the scrutiny, reconstructs itself and then changes again, like looking into a kaleidoscope: the colours are identical, their patterns slightly different every time, their detail constantly changes yet the picture remains true to itself
The regular attempts at sensemaking, however, continue. Here’s what I’m learning this week. The usual disclaimer applies: I’m just an average citizen with above average (but very, very rusty) math skills trying to make sense of what’s going on. Don’t trust anything I say! I welcome corrections and pushback!
From the beginning, the main thing I’ve been tracking has been daily new cases country-by-country. Here’s Fagen-Ulmschneider’s latest log graph:
This week’s trend is essentially a continuation of last week’s, which is good news for Italy (whose growth rate is slowing), and bad news for the U.S. (whose growth rate seems more or less consistent.
Early on, I started using a log graph, because it showed the growth rate more clearly, especially in the early days of growth, when curves can look deceivingly flat and linear. Now that some time has passed, one of the challenges of the log graph is becoming apparent: It dulls your sensitivity to how bad things are as you get higher in the graph (and the scale increases by orders of magnitude). You could conceivably look at the above graph and say to yourself, “Well, our curve isn’t flattening, but we’re not that much worse than Italy is,” but that would be a mistake, because you have to pay attention to your scale markers. You don’t have this problem with a linear graph:
Yeah, that looks (and is) a lot worse. The other challenge with these graphs is that the daily points create a spikiness that’s not helpful at best and deceiving at worst. If you’re checking this daily (which I’m doing), you can see a drop one day and think to yourself, “Yay! We’re flattening!”, only to see the the curve rise rapidly the next two. That is, in fact, what happened over the last three days with the national numbers, and it’s an even worse problem as you look at regional data. It would probably be better to show averages over the previous week or even weekly aggregates instead of daily (which might make more sense after a few more weeks).
In addition to the nice interface, one of the main reasons I started using Fagen-Ulmschneider’s dashboard is that he’s tracking state-by-state data as well. He’s even normalizing the data by population. My original impetus for doing my own tracking was that I couldn’t find anyone else normalizing by population. What I quickly realized was that normalizing by population at a national level doesn’t tell you much for two reasons. First, I was mainly interested in the slope of the curve, and normalizing by population doesn’t impact that. Second, outbreaks are regional in nature, and so normalizing by a country’s population (which encompasses many regions) can be misleading. However, I think it starts to become useful if you’re normalizing by a region’s population. I think doing this by state, while not as granular as I would like, is better than nothing. Here’s the state-by-state log graph tracking daily new cases normalized by population:
California (my state) was one of the first in the U.S. to confirm a COVID-19 case. It was also the first to institute a state-wide shelter-in-place directive. And, you can see that the curve seems to have flattened over the past five days. If you play with the dashboard itself, you’ll notice that if you hover over any datapoint, you can see growth data. In the past week, California’s growth rate has gone down from 15% daily (the growth rate over the previous 24 days) to 7% daily. Yesterday, there were 30 new confirmed cases of novel coronavirus per million people. (There are 40 million people in California.)
An aside on growth rates. One of the things that’s hard about all these different graphs is that they use different measures for growth rates. Fagen-Ulmschneider chooses to use daily growth percentage, and he shows a 35% growth curve as his baseline, because that was the initial growth curve for most European countries. (Yikes!) Other folks, including the regional dashboard I started following this past week, show doubling rate — the number of days it takes to double.
Finance folks use a relatively straightforward way of estimating the conversion between doubling rate and growth rate. I have a computer, so there’s no reason to estimate. The formula is ln 2 / ln r, where r is the growth rate. (The base of the log doesn’t matter, but I use a natural log, because that’s how the Rule of 72 is derived.) However, what I really wanted was a more intuitive sense of how those two rates are related, so I graphed the function:
You can see that the 35% growth rate baseline is equivalent to a doubling of cases every 2.2ish days. (Yikes!) Over the past 24 days, California’s growth rate was 15%, which means there was a doubling of cases every five days. Over the past week, the growth rate was 7%, which is the equivalent of doubling approximately every 10 days. (Good job, California!)
Which brings me to the regional dashboard I’ve been using. I love that this dashboard has county data. I also like the overall interface. It’s very fast to find data, browse nearby data, and configure the graph in relatively clean ways. I don’t like how it normalizes the Y-axis based on each region’s curve, which makes it very hard to get a sense of how different counties compare. You really need to pay attention to the growth rate, which it shows as doubling rate. Unlike the above dashboard, it doesn’t show you how the growth rate over the previous seven days compares to the overall growth curve, so it’s hard to detect flattening. My biggest pet peeve is that it doesn’t say who made the dashboard, which makes it harder to assess whether or not to trust it (although it does attribute its data sources), and it doesn’t let me share feedback or suggestions. (Maybe the latter is by design.)
Here’s the California data for comparison:
Another nice thing about this dashboard is that it shows confirmed cases (orange), daily new cases (green), and daily deaths (black). I keep hearing from folks saying that the reported cases data is useless because of underreporting due to lack of tests. These graphs should help dispel this, because — as you browse through counties — the slopes (which indicate growth rates) consistently match. Also, the overall growth rate shown here (doubling every 5.1 days) is consistent with the data in the other dashboard, so that’s nice validation.
Here’s what the Bay Area looks like:
You can see what I meant above about being hard to compare. This graph looks mostly the same as the California graph, but if you look at the scale of the Y-axis and the doubling rate, it’s very different. The Bay Area (which declared shelter-in-place even before the state did) is doing even better, curve-wise. (Good job, Bay Area!)
My next project is to try and get a better sense of what all the death numbers mean. More on that in a future blog post, perhaps. In the meantime, here are some other COVID-19 things I’m paying attention to.
First and foremost, I’m interested in how quickly we create an alternative to shelter-in-place, most likely some variation on test-and-trace. Until we have this in place, lifting shelter-in-place doesn’t make sense, even if we get our curve under control, because the growth rate will just shoot up again. This is nicely explained in Tomas Pueyo’s essay, “Coronavirus: The Hammer and the Dance.” My favorite systems explainer, Nicky Case, has partnered with an epidemiologist to create a dashboard that lets regular folks play with different scenarios. They haven’t released it yet, but this video nicely gives us the gist:
Unfortunately, the media isn’t really talking about what’s happening in this regard (other than the complete clusterfuck that our national response has been), so I have no idea what’s happening. Hang tight, I suppose.
On the other hand, there are some things we can learn from past pandemics. This National Geographic article shares these lessons (and visualizations) from the 1918 flu pandemic, a good warning about lifting shelter-in-place prematurely. (Hat tip to Kevin Cheng.) Similarly, Dave Pollard shares some lessons learned from SARS, several of which are very sobering.
In the meantime, the most pressing concern is hospital capacity. Last week, I mentioned the Institute for Health Metrics and Evaluation’s dashboard, which got some national play too and apparently had a role in waking up our national leadership. Carl Bergstrom, an epidemiologist who also happens to study how disinformation spreads, tweeted some useful commentary on how to (and how not to) interpret this data.
Speaking of disinformation, these are interesting times, not just because of the horrific role that disinformation campaigns are playing in our inability to response, but also because it’s surfacing in a more nuanced way the complicated nature of expertise. FiveThirtyEight published an excellent piece explaining why it’s so hard to build a COVID-19 model. Zeynep Tufekci’s article, “Don’t Believe the COVID-19 Models,” complements the FiveThirtyEight piece nicely. Ed Yong demonstrates how this complexity plays out in his excellent piece on masks. And Philippe Lemoine nicely explains where common sense fits into all of this. (Hat tip to Carmen Medina.)