Earlier this year, in a blog post on Faster Than 20 about George Floyd, I tried to point out that, as terrible and visceral as his murder was, the overall racial disparity in police killings should feel far more horrifying. But, I explained:
No one has ever looked at a number and taken to the streets. There are lots of mental hoops required to make sense of that number, to trust its implications, and then to get outraged by it.
Later, in an exchange with a colleague in the comments, I wrote:
There’s also a larger question worth asking about whether the 1,000 police killings a year is too high, regardless of what you think of the racial disparity, which gets you into questions about police militarization and policies for community safety in general.
I suppose now is as good a time as any to ask the larger question: Is 1,000 police killings a year too high?
All things being equal, my first guess as to what the “right” number of police killings should be is zero. Hard to argue with that, right?
Well, that depends. Consider a school shooting, for example. If somebody is spraying bullets at a school with the clear intent to kill as many people as possible, I definitely want the police to shoot and kill that person. It’s not hard to think of other situations where a police killing is not only justified, but where it might save many other lives.
So the “right” number of police killings is probably greater than zero. But how much greater?
I might try going down that rabbit hole another day, but I want to pivot to a different question: How many COVID-19 deaths are too high?
As of today, 240,000 people have officially died of COVID-19 in the U.S. (This doesn’t count indirect fatalities, which would put the number well over 300,000.) Over the past week, we’ve averaged 940 deaths a day from COVID-19. On the one hand, it’s less than half of our peak on April 24, when we averaged 2,240 deaths a day. On the other hand, the number is trending in the wrong direction.
Is a thousand deaths a day too much? What would an “acceptable” number of daily deaths be?
Let’s try to think of this question in a different way. How many car deaths per day are too many? How many car deaths per day are “acceptable”? Don’t do any research. Just try to come up with two numbers and some explanation as to how you came up with them. Don’t worry about being “right.” This is simply an experiment.
Got an answer? Okay, suppose that you’re surpassing your “too many” number. What would you do to get those numbers down?
Think about this for a second. Now compare your numbers from the 2016 U.S. numbers listed in this Wikipedia page.
I don’t have good answers to any of these questions. (I’d love to hear yours in the comments below.) I think that a thousand deaths a day is too many, but I really can’t justify the tradeoffs.
I do know two things. First, human intuition is pretty much useless when it comes to these questions. Joseph Stalinsupposedly said, “The death of one man is a tragedy. The death of millions is a statistic.” It turns out that this is a fact of human nature. It’s known as psychic numbing.
Second, economists estimate that the value of one human life in the U.S. is roughly $10 million. So 240,000 deaths is equivalent to the loss of $2.4 trillion, over 10 percent of our GDP last year. By these admittedly crass and undoubtedly wrong estimates, it seems like a 10 percent drop in GDP is worth the tradeoff of saving 240,000 lives.
The regular attempts at sensemaking, however, continue. Here’s what I’m learning this week. The usual disclaimer applies: I’m just an average citizen with above average (but very, very rusty) math skills trying to make sense of what’s going on. Don’t trust anything I say! I welcome corrections and pushback!
From the beginning, the main thing I’ve been tracking has been daily new cases country-by-country. Here’s Fagen-Ulmschneider’s latest log graph:
This week’s trend is essentially a continuation of last week’s, which is good news for Italy (whose growth rate is slowing), and bad news for the U.S. (whose growth rate seems more or less consistent.
Early on, I started using a log graph, because it showed the growth rate more clearly, especially in the early days of growth, when curves can look deceivingly flat and linear. Now that some time has passed, one of the challenges of the log graph is becoming apparent: It dulls your sensitivity to how bad things are as you get higher in the graph (and the scale increases by orders of magnitude). You could conceivably look at the above graph and say to yourself, “Well, our curve isn’t flattening, but we’re not that much worse than Italy is,” but that would be a mistake, because you have to pay attention to your scale markers. You don’t have this problem with a linear graph:
Yeah, that looks (and is) a lot worse. The other challenge with these graphs is that the daily points create a spikiness that’s not helpful at best and deceiving at worst. If you’re checking this daily (which I’m doing), you can see a drop one day and think to yourself, “Yay! We’re flattening!”, only to see the the curve rise rapidly the next two. That is, in fact, what happened over the last three days with the national numbers, and it’s an even worse problem as you look at regional data. It would probably be better to show averages over the previous week or even weekly aggregates instead of daily (which might make more sense after a few more weeks).
In addition to the nice interface, one of the main reasons I started using Fagen-Ulmschneider’s dashboard is that he’s tracking state-by-state data as well. He’s even normalizing the data by population. My original impetus for doing my own tracking was that I couldn’t find anyone else normalizing by population. What I quickly realized was that normalizing by population at a national level doesn’t tell you much for two reasons. First, I was mainly interested in the slope of the curve, and normalizing by population doesn’t impact that. Second, outbreaks are regional in nature, and so normalizing by a country’s population (which encompasses many regions) can be misleading. However, I think it starts to become useful if you’re normalizing by a region’s population. I think doing this by state, while not as granular as I would like, is better than nothing. Here’s the state-by-state log graph tracking daily new cases normalized by population:
California (my state) was one of the first in the U.S. to confirm a COVID-19 case. It was also the first to institute a state-wide shelter-in-place directive. And, you can see that the curve seems to have flattened over the past five days. If you play with the dashboard itself, you’ll notice that if you hover over any datapoint, you can see growth data. In the past week, California’s growth rate has gone down from 15% daily (the growth rate over the previous 24 days) to 7% daily. Yesterday, there were 30 new confirmed cases of novel coronavirus per million people. (There are 40 million people in California.)
An aside on growth rates. One of the things that’s hard about all these different graphs is that they use different measures for growth rates. Fagen-Ulmschneider chooses to use daily growth percentage, and he shows a 35% growth curve as his baseline, because that was the initial growth curve for most European countries. (Yikes!) Other folks, including the regional dashboard I started following this past week, show doubling rate — the number of days it takes to double.
Finance folks use a relatively straightforward way of estimating the conversion between doubling rate and growth rate. I have a computer, so there’s no reason to estimate. The formula is ln 2 / ln r, where r is the growth rate. (The base of the log doesn’t matter, but I use a natural log, because that’s how the Rule of 72 is derived.) However, what I really wanted was a more intuitive sense of how those two rates are related, so I graphed the function:
You can see that the 35% growth rate baseline is equivalent to a doubling of cases every 2.2ish days. (Yikes!) Over the past 24 days, California’s growth rate was 15%, which means there was a doubling of cases every five days. Over the past week, the growth rate was 7%, which is the equivalent of doubling approximately every 10 days. (Good job, California!)
Which brings me to the regional dashboard I’ve been using. I love that this dashboard has county data. I also like the overall interface. It’s very fast to find data, browse nearby data, and configure the graph in relatively clean ways. I don’t like how it normalizes the Y-axis based on each region’s curve, which makes it very hard to get a sense of how different counties compare. You really need to pay attention to the growth rate, which it shows as doubling rate. Unlike the above dashboard, it doesn’t show you how the growth rate over the previous seven days compares to the overall growth curve, so it’s hard to detect flattening. My biggest pet peeve is that it doesn’t say who made the dashboard, which makes it harder to assess whether or not to trust it (although it does attribute its data sources), and it doesn’t let me share feedback or suggestions. (Maybe the latter is by design.)
Here’s the California data for comparison:
Another nice thing about this dashboard is that it shows confirmed cases (orange), daily new cases (green), and daily deaths (black). I keep hearing from folks saying that the reported cases data is useless because of underreporting due to lack of tests. These graphs should help dispel this, because — as you browse through counties — the slopes (which indicate growth rates) consistently match. Also, the overall growth rate shown here (doubling every 5.1 days) is consistent with the data in the other dashboard, so that’s nice validation.
Here’s what the Bay Area looks like:
You can see what I meant above about being hard to compare. This graph looks mostly the same as the California graph, but if you look at the scale of the Y-axis and the doubling rate, it’s very different. The Bay Area (which declared shelter-in-place even before the state did) is doing even better, curve-wise. (Good job, Bay Area!)
My next project is to try and get a better sense of what all the death numbers mean. More on that in a future blog post, perhaps. In the meantime, here are some other COVID-19 things I’m paying attention to.
First and foremost, I’m interested in how quickly we create an alternative to shelter-in-place, most likely some variation on test-and-trace. Until we have this in place, lifting shelter-in-place doesn’t make sense, even if we get our curve under control, because the growth rate will just shoot up again. This is nicely explained in Tomas Pueyo’s essay, “Coronavirus: The Hammer and the Dance.” My favorite systems explainer, Nicky Case, has partnered with an epidemiologist to create a dashboard that lets regular folks play with different scenarios. They haven’t released it yet, but this video nicely gives us the gist:
Unfortunately, the media isn’t really talking about what’s happening in this regard (other than the complete clusterfuck that our national response has been), so I have no idea what’s happening. Hang tight, I suppose.
On the other hand, there are some things we can learn from past pandemics. This National Geographic article shares these lessons (and visualizations) from the 1918 flu pandemic, a good warning about lifting shelter-in-place prematurely. (Hat tip to Kevin Cheng.) Similarly, Dave Pollard shares some lessons learned from SARS, several of which are very sobering.
TL;DR I’m now using this dashboard as a way to make sense of what’s happening with COVID-19. It’s still too soon to draw any conclusions about how well the U.S.’s interventions overall are working.
I started trying to make sense of the COVID-19 growth rate data myself on March 13, 16 days ago. I learned a lot along the way, and my daily ritual of looking up numbers and updating my spreadsheet has been strangely calming. Here’s my latest graph:
Italy’s growth rate seems to be flattening, which is a positive sign
U.S.’s growth curve continues to rise at a steady rate; more on this below
Even though China and Korea’s growth rates have been steady for a while now, it’s not zero. They have this under control (for now), but it’s far from over, and it won’t be until we have a vaccine, which folks keep saying is at least 12-18 months away.
My friend, Scott Foehner, chided me last week for saying that the results are lagging by about a week. He’s right. Based on Tomas Pueyo’s analysis (which I cited in my original blog post), the lag is more like 12 days. This is why the Bay Area shelter-in-place ordinance was for three weeks — that’s how much time you need to see if you’re containing your growth rate.
Shelter-in-place in the Bay Area started on March 17, exactly 12 days ago and four days after I started tracking. California’s order started on March 20. Other states followed after that, but not all.
It’s hard to make sense of all this when aggregated as a country. I’ve been wanting regional data for a while now, but have felt too overwhelmed to parse it out myself. Fortunately, other people have been doing this.
One of the positive outcomes of me doing this for myself for the past few weeks is that it’s given me a better sense of how to interpret other people’s graphs, and it’s helped me separate the wheat from the chaff. It’s also made me realize how poor data literacy seems to be for many media outlets, including major ones. They’re contributing to the problem by overwhelming people with graphs that are either not relevant or are not contextualized.
Wade’s dashboard is pretty configurable overall, although you have limited control over which region’s data you’re showing. Here’s the closest equivalent to what I’ve been tracking:
And here’s what I’ve really wanted to see: the state-by-state data:
What does this tell us about the interventions so far? Again, not much. It’s too soon. Check back in another week.
I’ve seen some articles floating around with graphs comparing California to New York, crowing that sheltering-in-place is already working here. That may be the case, but it’s still too early for us to know that, and it’s irresponsible to point to a chart and suggest that this is the case. There are lots of reasons why New York might be doing so poorly compared to California that have nothing to do with interventions, density being the obvious one. Regardless, history has proven that even a few days can make a huge difference when it comes to containing epidemics, and I feel incredibly grateful that our local leaders acted as quickly as they did.
I think there are two questions that are on people’s minds. One is about hospital capacity. I’ve seen various attempts to model this, including the Covid Act Now site I mentioned last week. The one I find easiest to browse is this dashboard from the Institute for Health Metrics and Evaluation. They publish their model, which I haven’t even attempted to parse yet. (I doubt that I have the expertise to evaluate it anyway.) It suggests that, even if our current measures have flattened the curve in California, we’ll still exceed our capacity of ICU beds needed in about two weeks, although we should be okay in terms of general hospital capacity.
The second question is how much longer we’ll need to shelter-in-place (or worse). Even if we flatten the curve, lifting shelter-in-place will just get that curve going again unless we have an alternative way of managing it (e.g. test-and-trace). I haven’t seen any indications of when that will happen, so we’ll just have to continue to be patient. I feel like every day is a grind, and I’m one of the lucky ones. I can’t imagine how folks on the frontlines and those far less fortunate than me are dealing right now.
Many thanks to all of the feedback about my latest attempt to make sense of the Coronavirus pandemic. I listened, and I played, and I learned, and I now have a new graph that I think better represents how the U.S. is doing:
This is a semi-logarithmic graph of daily new cases over time. I’m comparing the U.S. (blue and bold), China (red), South Korea (yellow), and Italy (green). I’ll explain the changes I made from last time below, but first, three quick takeaways:
First, don’t trust my analysis. I’m an amateur at this, my math is incredibly rusty, and it turns out that my statistics (which were always suspect) are even more suspect than I thought. Critiques, corrections, and constructive discussion encouraged!
Second, don’t passively consume what you read. This started off as a quick exercise to try to make sense of the craziness. It’s led to lots of encouragement, but also lots of (welcome) critiques, which has helped me sharpen my analysis, correct some assumptions, and feel like I have a better grasp of what’s going on and how to assess other things I read. I feel a lot better than I did a few days ago
Third, the U.S. isn’t doing great right now. Our line is more or less tracking China, Korea, and Italy’s initial growth rate, but both China and Korea had started slowing their growth rate about a week before we did. Our curve is looking a lot more like Italy’s, which does not speak well of the weeks ahead here.
Here are the changes I made from last time:
First, I switched over to a semi-logarithmic graph. Hat tip to Ken Chase for encouraging me to do this and to Matt Bruce for this pointer as to why this is important.
When I originally started mapping this out, I didn’t think that the semi-logarithmic graph would tell me much more than the linear graph did. Italy was off the chart, which was all I felt like I needed to know, and I felt like I could make sense of the rest of the curves on the chart. Still, after receiving this feedback, I decided to try the semi-logarithmic version to see what I would learn. As you can see above, my conclusion changed quite dramatically. We in the U.S. are not doing well right now. You can see that the slope of our graph tracks quite closely with Italy.
The other (more math-y) benefit is that I can measure the slope of the line (0.15), which gives me the rough power law for how viral Coronavirus has been in both the U.S. and Italy (y = 100.15x). China and Korea’s containment slope (-0.1) provides the rough power law for the potential impact of containment (y = 10-0.1x). You can use these equations to model out different scenarios (which my friend, Charlie Graham, has been doing).
Second, I’m no longer normalizing by population. Two people questioned whether or not this was useful. (Thank you, Majken Longlade and Corey O’Hara.) Their argument was that normalizing by population doesn’t tell you much in the case of epidemics, because transmission and virality are more a function of closeness and density. The U.S. is a huge country geographically compared to Korea or Italy. The better approach would be to try to normalize based on population of regional outbreaks.
I agree, and I’d like to try to do this. The data is a lot harder to come by, but I think it would be possible to manually pull together with a little bit of elbow grease, especially if we’re constraining the countries where we’re trying to do this. (Leave a comment below if you’d be interested in trying this.)
(Side note: I am extremely grateful for the wide availability of data and for all the people doing an incredible job of analyzing and sharing. This is no accident. A lot of folks have invested an incredible amount of time and energy over the past two decades advocating for the open web and open data, all of which is required to make this work. This becomes even more clear when realizing what’s missing. Martin Cleaver asked me to include Canada in my graph. Easy enough, I thought. Turns out it’s not. Canada doesn’t make this data available. Majken shared this article that explains Canada’s situation.)
Nevertheless, I thought it was still useful to look at data normalized by population. My reasoning was that, at worst, it wouldn’t make the data worse, and, at best, it might make it better. I decided I would try to test this assumption by comparing the graphs of the normalized data with the non-normalized data above:
Again, I think switching to a semi-logarithmic graph made a difference, because if you compare these to graphs, the slopes (which I’m most concerned about) are largely the same. Normalizing the data doesn’t impact the slopes. On the one hand, my assumption was correct — normalizing didn’t seem to hurt the data. On the other hand, it also didn’t tell me anything new, either. So, I decided to stop normalizing and stick with the data as is. (I’d still like to try normalizing by outbreak region, though.)
One point that came up often was that these graphs don’t take into account the underreporting in the U.S. due to lack of testing. I tried to take this into account in my very first sketch, as I mentioned in my original blog post. However, I decided to move away from this for a few reasons. First, every country is underreporting. It didn’t feel useful to add in hand-wavy multipliers. Second — and this is where the semi-logarithmic graph again comes to the rescue — adding a multiplier won’t change the slope, which is what I’m really interested in. It just moves the curve up or down.
Remember, these are all lagging indicators anyway. In all likelihood, any changes we make today won’t be reflected for at least a week. What’s done is done. The best thing we can do right now is to be as proactive as possible, given the circumstances. If we’re going to implement policies like Italy has, it’s better that it happens today than a week from now. Public policy aside, there is one thing we all can do that will absolutely make a difference: STAY HOME!
Many thanks to Martin Cleaver and Matt Bruce for sharing my previous blog post, which led to a lot of the discussion that shaped this latest iteration. And many thanks to all who have engaged with this so far. Stay home, wash your hands, and take care!
Thanks to those of you who commented on my post last night on my attempts to better understand what’s happening with Coronavirus and how we’re currently doing here in the U.S. My friend, Raj, suggested I do a cleaner version, so I put the data in this Google Spreadsheet and let technology do its thing:
A reminder: These lines represent normalized (by population) daily new cases in the U.S. (blue), China (red), South Korea (yellow), and Italy (green). I haven’t seen anyone else normalize by population, which helps make more of an apples-by-apples comparison. The closest thing I’ve seen is Our World In Data’s sparklines, which are wonderful. (Hat tip to Phoebe Ayers for the pointer.)
I also made two improvements from my previous version:
The graphs are generated from precise data points rather than my back-of-envelope calculations and sketches. I also made the spreadsheet I used public so that others can double-check or re-use.
I picked a more precise “Day 0” for each country — the first day with zero new cases followed by a bunch of non-zero days. This worked out to February 27 for the U.S., January 22 for China, February 18 for South Korea, and February 20 for Italy.
Unlike my previous version, I’m showing the full Italy curve. (Wow.) Here’s a zoomed-in version that gives us a better sense of what’s happening in the U.S. (and is also pretty close to last night’s rough sketch, which makes me happy):
The graph suggests that we’ve been able to “flatten the curve” so far, and that aggressive measures by local government and businesses are probably working. However, seeing the curve jump like Italy’s is still not out-of-the-question. We still don’t have widespread testing in this country (although there are positive signs), and — as my friend Sheldon Chang observed — we’re unlikely to be able to implement the aggressive, targeted, digital surveillance that they’re able to do in Asia. More aggressive containment is still a possibility, but for now, I feel like I’m able to breathe a bit easier. Stay vigilant, everyone! Keep your physical distance, wash your hands, and take care!