Update: More recent iterations are available:
Many thanks to all of the feedback about my latest attempt to make sense of the Coronavirus pandemic. I listened, and I played, and I learned, and I now have a new graph that I think better represents how the U.S. is doing:
This is a semi-logarithmic graph of daily new cases over time. I’m comparing the U.S. (blue and bold), China (red), South Korea (yellow), and Italy (green). I’ll explain the changes I made from last time below, but first, three quick takeaways:
First, don’t trust my analysis. I’m an amateur at this, my math is incredibly rusty, and it turns out that my statistics (which were always suspect) are even more suspect than I thought. Critiques, corrections, and constructive discussion encouraged!
Second, don’t passively consume what you read. This started off as a quick exercise to try to make sense of the craziness. It’s led to lots of encouragement, but also lots of (welcome) critiques, which has helped me sharpen my analysis, correct some assumptions, and feel like I have a better grasp of what’s going on and how to assess other things I read. I feel a lot better than I did a few days ago
Third, the U.S. isn’t doing great right now. Our line is more or less tracking China, Korea, and Italy’s initial growth rate, but both China and Korea had started slowing their growth rate about a week before we did. Our curve is looking a lot more like Italy’s, which does not speak well of the weeks ahead here.
Here are the changes I made from last time:
First, I switched over to a semi-logarithmic graph. Hat tip to Ken Chase for encouraging me to do this and to Matt Bruce for this pointer as to why this is important.
When I originally started mapping this out, I didn’t think that the semi-logarithmic graph would tell me much more than the linear graph did. Italy was off the chart, which was all I felt like I needed to know, and I felt like I could make sense of the rest of the curves on the chart. Still, after receiving this feedback, I decided to try the semi-logarithmic version to see what I would learn. As you can see above, my conclusion changed quite dramatically. We in the U.S. are not doing well right now. You can see that the slope of our graph tracks quite closely with Italy.
The other (more math-y) benefit is that I can measure the slope of the line (0.15), which gives me the rough power law for how viral Coronavirus has been in both the U.S. and Italy (y = 100.15x). China and Korea’s containment slope (-0.1) provides the rough power law for the potential impact of containment (y = 10-0.1x). You can use these equations to model out different scenarios (which my friend, Charlie Graham, has been doing).
Second, I’m no longer normalizing by population. Two people questioned whether or not this was useful. (Thank you, Majken Longlade and Corey O’Hara.) Their argument was that normalizing by population doesn’t tell you much in the case of epidemics, because transmission and virality are more a function of closeness and density. The U.S. is a huge country geographically compared to Korea or Italy. The better approach would be to try to normalize based on population of regional outbreaks.
I agree, and I’d like to try to do this. The data is a lot harder to come by, but I think it would be possible to manually pull together with a little bit of elbow grease, especially if we’re constraining the countries where we’re trying to do this. (Leave a comment below if you’d be interested in trying this.)
(Side note: I am extremely grateful for the wide availability of data and for all the people doing an incredible job of analyzing and sharing. This is no accident. A lot of folks have invested an incredible amount of time and energy over the past two decades advocating for the open web and open data, all of which is required to make this work. This becomes even more clear when realizing what’s missing. Martin Cleaver asked me to include Canada in my graph. Easy enough, I thought. Turns out it’s not. Canada doesn’t make this data available. Majken shared this article that explains Canada’s situation.)
Nevertheless, I thought it was still useful to look at data normalized by population. My reasoning was that, at worst, it wouldn’t make the data worse, and, at best, it might make it better. I decided I would try to test this assumption by comparing the graphs of the normalized data with the non-normalized data above:
Again, I think switching to a semi-logarithmic graph made a difference, because if you compare these to graphs, the slopes (which I’m most concerned about) are largely the same. Normalizing the data doesn’t impact the slopes. On the one hand, my assumption was correct — normalizing didn’t seem to hurt the data. On the other hand, it also didn’t tell me anything new, either. So, I decided to stop normalizing and stick with the data as is. (I’d still like to try normalizing by outbreak region, though.)
One point that came up often was that these graphs don’t take into account the underreporting in the U.S. due to lack of testing. I tried to take this into account in my very first sketch, as I mentioned in my original blog post. However, I decided to move away from this for a few reasons. First, every country is underreporting. It didn’t feel useful to add in hand-wavy multipliers. Second — and this is where the semi-logarithmic graph again comes to the rescue — adding a multiplier won’t change the slope, which is what I’m really interested in. It just moves the curve up or down.
Remember, these are all lagging indicators anyway. In all likelihood, any changes we make today won’t be reflected for at least a week. What’s done is done. The best thing we can do right now is to be as proactive as possible, given the circumstances. If we’re going to implement policies like Italy has, it’s better that it happens today than a week from now. Public policy aside, there is one thing we all can do that will absolutely make a difference: STAY HOME!
One more aside: My friend, Greg Gentschev, has often said that the best thing we can do to become better systems thinkers and doers is to learn how compounding works. (Turns out that the physicist, Albert Allen Bartlett, said this too. Great minds!) Maybe one of the positive outcomes of all this is that this will start to happen. I’ve seen two great resources for this so far. One is the Washington Post’s Coronavirus Simulator, which they published yesterday. The other is this video on exponential growth and epidemics. (Hat tip to Nicky Case and James Cham.)
Many thanks to Martin Cleaver and Matt Bruce for sharing my previous blog post, which led to a lot of the discussion that shaped this latest iteration. And many thanks to all who have engaged with this so far. Stay home, wash your hands, and take care!