Last night, some friends and I were talking about conspiracy theories and the U.S. government, which led my friend, Greg, to tell us about Operation Sea-Spray. In September 1950, the U.S. Navy sprayed a cloud of the microbe, Serratia marcescens into the air two-miles off the coast of San Francisco in order to see how susceptible we might be to germ warfare.
Let’s pause for a moment to reflect on this logic. In order to see how susceptible we were to germ warfare, the U.S. government decided to unleash germ warfare on its own citizens.
In fairness to the government, they chose a “harmless” microbe. You’ve probably seen Serratia marcescens before. It’s the same microbe that forms pink streaks in your toilet and shower when you neglect to clean them.
A week after the spraying, eleven patients were admitted to the now defunct Stanford University Hospital in San Francisco with severe urinary tract infections, resistant to the limited antibiotics available in that era. One gentlemen, recovering from prostate surgery, developed complications of heart infection as Serratia colonized his heart valves. His would be the only death during the aftermath of the experiment.
Stanford University Hospital doctors culturing the patients’ urine on petri dishes found an unusual and unexpected discovery: microbes blushing with a cherry red pigment. Infection with Serratia was so rare that the outbreak was extensively investigated by the University to identify the origins of this scarlet letter bug. Though the source of this unusual organism could not be located despite an exhaustive epidemiological search, Stanford published a report on the outbreak, noting that “the isolation of a red pigment-producing bacterium from the urine of human beings was of interest, at first, as a curious clinical observation. Later, the repeated occurrence of urinary-tract infection by this organism, with bacteremia in two patients and death in one, indicated the potential clinical importance of this group of bacteria.” It was the first recorded outbreak of Serratia in the history of microbiology.
The government didn’t disclose these experiments — a clear violation of the Nuremberg Code — publicly for another 27 years.
There’s a reason why people don’t trust institutions and are susceptible to conspiracy theories. If you want to undo the damage from incidents like these, you have to acknowledge what you’ve done in the past and work intentionally to rebuild trust
My morning ritual for the past week has been to update my COVID-19 spreadsheet and ponder my chart. Here’s the latest:
On the one hand, if you compare it to last week’s chart, it’s not a happy result for those of us in the U.S. (Italy’s curve might be flattening. We’ll see next week.) On the other hand, remember that this is a lagging indicator. This past week’s line was essentially pre-determined by what happened the previous week. Earlier this week, the Bay Area institutedshelter-in-place. Shortly thereafter, California made it state-wide, and New York and Illinois followed suit after that. We’ll see if this has any noticeable impact next week.
I made one slight tweak to the graph (adding labels to the axes; thanks to Kate Wing for the gentle scolding). I’d like to change the gridlines on the x-axis to every seven days, but can’t do that in Google Sheets. Not a huge deal. I’d also like to experiment with a log 2 graph (versus log 10) on the y-axis to more easily show how many days it takes for new cases to double, but again, can’t do that from Google Sheets. Again, not a big deal. I’d also like to do a region-by-region analysis, as suggested by many others and made possible by David Janes’ data, but haven’t gotten around to that yet.
I started doing all of this as an exercise in self-care. I wanted to understand what was happening, and I found what I was reading to be not just largely unhelpful, but actually debilitating. This has helped a lot. There is something very calming about looking up numbers, plugging them into a spreadsheet, and pondering the results, even if the results aren’t very good. (Come to think of it, this also played a huge role in helping me achieve better work-life balance, so it might be a pattern.) I haven’t been able to avoid the media as much as I hoped, but it’s helped me make sense of what I’m seeing and ignore articles and missives that are generally unhelpful or worse. It’s also validating when folks who understand this stuff far better than me are coming to similar conclusions.
I’ve loved seeing friends and others play with the data as well. One of the best websites I’ve seen is Covid Act Now, which shows state-by-state projections based on hospital capacity and what we understand about different interventions. They’ve also shared their model openly, and they’re posting the right disclaimers. (Good rule of thumb: Be skeptical of anyone who claims certainty about their conclusions unless they’re an epidemiologist, and even then, take everything with a grain of salt.)
I’m also inspired by everyone working on the front lines — from health care workers to domestic workers — and to those who are doing their part to support those who are. (Hat tip to Jon Stahl for sharing the amazing work that Carl Coryell-Martin instigated, for example.) Stay safe everyone, stay at home if you can, and be well.
Many thanks to all of the feedback about my latest attempt to make sense of the Coronavirus pandemic. I listened, and I played, and I learned, and I now have a new graph that I think better represents how the U.S. is doing:
This is a semi-logarithmic graph of daily new cases over time. I’m comparing the U.S. (blue and bold), China (red), South Korea (yellow), and Italy (green). I’ll explain the changes I made from last time below, but first, three quick takeaways:
First, don’t trust my analysis. I’m an amateur at this, my math is incredibly rusty, and it turns out that my statistics (which were always suspect) are even more suspect than I thought. Critiques, corrections, and constructive discussion encouraged!
Second, don’t passively consume what you read. This started off as a quick exercise to try to make sense of the craziness. It’s led to lots of encouragement, but also lots of (welcome) critiques, which has helped me sharpen my analysis, correct some assumptions, and feel like I have a better grasp of what’s going on and how to assess other things I read. I feel a lot better than I did a few days ago
Third, the U.S. isn’t doing great right now. Our line is more or less tracking China, Korea, and Italy’s initial growth rate, but both China and Korea had started slowing their growth rate about a week before we did. Our curve is looking a lot more like Italy’s, which does not speak well of the weeks ahead here.
Here are the changes I made from last time:
First, I switched over to a semi-logarithmic graph. Hat tip to Ken Chase for encouraging me to do this and to Matt Bruce for this pointer as to why this is important.
When I originally started mapping this out, I didn’t think that the semi-logarithmic graph would tell me much more than the linear graph did. Italy was off the chart, which was all I felt like I needed to know, and I felt like I could make sense of the rest of the curves on the chart. Still, after receiving this feedback, I decided to try the semi-logarithmic version to see what I would learn. As you can see above, my conclusion changed quite dramatically. We in the U.S. are not doing well right now. You can see that the slope of our graph tracks quite closely with Italy.
The other (more math-y) benefit is that I can measure the slope of the line (0.15), which gives me the rough power law for how viral Coronavirus has been in both the U.S. and Italy (y = 100.15x). China and Korea’s containment slope (-0.1) provides the rough power law for the potential impact of containment (y = 10-0.1x). You can use these equations to model out different scenarios (which my friend, Charlie Graham, has been doing).
Second, I’m no longer normalizing by population. Two people questioned whether or not this was useful. (Thank you, Majken Longlade and Corey O’Hara.) Their argument was that normalizing by population doesn’t tell you much in the case of epidemics, because transmission and virality are more a function of closeness and density. The U.S. is a huge country geographically compared to Korea or Italy. The better approach would be to try to normalize based on population of regional outbreaks.
I agree, and I’d like to try to do this. The data is a lot harder to come by, but I think it would be possible to manually pull together with a little bit of elbow grease, especially if we’re constraining the countries where we’re trying to do this. (Leave a comment below if you’d be interested in trying this.)
(Side note: I am extremely grateful for the wide availability of data and for all the people doing an incredible job of analyzing and sharing. This is no accident. A lot of folks have invested an incredible amount of time and energy over the past two decades advocating for the open web and open data, all of which is required to make this work. This becomes even more clear when realizing what’s missing. Martin Cleaver asked me to include Canada in my graph. Easy enough, I thought. Turns out it’s not. Canada doesn’t make this data available. Majken shared this article that explains Canada’s situation.)
Nevertheless, I thought it was still useful to look at data normalized by population. My reasoning was that, at worst, it wouldn’t make the data worse, and, at best, it might make it better. I decided I would try to test this assumption by comparing the graphs of the normalized data with the non-normalized data above:
Again, I think switching to a semi-logarithmic graph made a difference, because if you compare these to graphs, the slopes (which I’m most concerned about) are largely the same. Normalizing the data doesn’t impact the slopes. On the one hand, my assumption was correct — normalizing didn’t seem to hurt the data. On the other hand, it also didn’t tell me anything new, either. So, I decided to stop normalizing and stick with the data as is. (I’d still like to try normalizing by outbreak region, though.)
One point that came up often was that these graphs don’t take into account the underreporting in the U.S. due to lack of testing. I tried to take this into account in my very first sketch, as I mentioned in my original blog post. However, I decided to move away from this for a few reasons. First, every country is underreporting. It didn’t feel useful to add in hand-wavy multipliers. Second — and this is where the semi-logarithmic graph again comes to the rescue — adding a multiplier won’t change the slope, which is what I’m really interested in. It just moves the curve up or down.
Remember, these are all lagging indicators anyway. In all likelihood, any changes we make today won’t be reflected for at least a week. What’s done is done. The best thing we can do right now is to be as proactive as possible, given the circumstances. If we’re going to implement policies like Italy has, it’s better that it happens today than a week from now. Public policy aside, there is one thing we all can do that will absolutely make a difference: STAY HOME!
Many thanks to Martin Cleaver and Matt Bruce for sharing my previous blog post, which led to a lot of the discussion that shaped this latest iteration. And many thanks to all who have engaged with this so far. Stay home, wash your hands, and take care!
Thanks to those of you who commented on my post last night on my attempts to better understand what’s happening with Coronavirus and how we’re currently doing here in the U.S. My friend, Raj, suggested I do a cleaner version, so I put the data in this Google Spreadsheet and let technology do its thing:
A reminder: These lines represent normalized (by population) daily new cases in the U.S. (blue), China (red), South Korea (yellow), and Italy (green). I haven’t seen anyone else normalize by population, which helps make more of an apples-by-apples comparison. The closest thing I’ve seen is Our World In Data’s sparklines, which are wonderful. (Hat tip to Phoebe Ayers for the pointer.)
I also made two improvements from my previous version:
The graphs are generated from precise data points rather than my back-of-envelope calculations and sketches. I also made the spreadsheet I used public so that others can double-check or re-use.
I picked a more precise “Day 0” for each country — the first day with zero new cases followed by a bunch of non-zero days. This worked out to February 27 for the U.S., January 22 for China, February 18 for South Korea, and February 20 for Italy.
Unlike my previous version, I’m showing the full Italy curve. (Wow.) Here’s a zoomed-in version that gives us a better sense of what’s happening in the U.S. (and is also pretty close to last night’s rough sketch, which makes me happy):
The graph suggests that we’ve been able to “flatten the curve” so far, and that aggressive measures by local government and businesses are probably working. However, seeing the curve jump like Italy’s is still not out-of-the-question. We still don’t have widespread testing in this country (although there are positive signs), and — as my friend Sheldon Chang observed — we’re unlikely to be able to implement the aggressive, targeted, digital surveillance that they’re able to do in Asia. More aggressive containment is still a possibility, but for now, I feel like I’m able to breathe a bit easier. Stay vigilant, everyone! Keep your physical distance, wash your hands, and take care!
Like most folks I know, I’ve been feeling increasingly stressed about the Coronavirus pandemic. I had done my best to educate myself and prepare, but I’ve been surprised by how scared and anxious I’ve been this past week.
Early on, my social media feed was invaluable at helping me understand what was happening. Now, it’s just causing me stress. Yesterday, I decided to try to limit my social media (and media) exposure. Instead, I would check the daily new cases graph once-ish a day, then just live my life. I’ve been primarily using worldometer, but I switch to The New York Times (which is updated more frequently and comes with news summaries) when I get antsy.
My reasoning was simple. Coronavirus is here in the U.S., and it’s spreading. (Because of lack of testing, we likely have many more cases than currently reported.) We missed our opportunity for containment, so now it’s all about mitigation. Most of the commentary doesn’t offer any real insight into how we’re actually doing in that regard, so I’m better off mostly ignoring it. The curve gives me real data on how we’re doing.
The problem is that it’s hard for me to gauge anything from this data other than that we’re on the growth-side of the curve, which I already know. I decided to map some additional data onto the curve to see if that helped. I looked at three other countries: China, South Korea, and Italy. China and South Korea have, by all accounts, handled things well. I’m not sure if Italy is handling things poorly, but — by all accounts — things are going poorly there. I figured that comparing these three data sets with the U.S. curve would give me a better sense of how we’re doing and what to possibly expect.
I looked at roughly a month of data for all four countries. Cases in South Korea, Italy, and the U.S. all started coming up around the same time, so I could actually use data from the same time period. Thing started blowing up in China roughly a month earlier, so I took the earlier data and mapped it onto the current time period. The key step I took that I haven’t seen in any other charts so far was to normalize the data by population (South Korea = 0.15; Italy = 0.19; U.S. = 1; China = 4.35).
Here’s what I came up with:
The orange curve is the U.S. data. The dotted line is a worst-case projection based on where we actually are based on death rate. (See Mona Chalabi’s excellent Instagram post, which uses analysis from Tomas Pueyo, for more on this.) I did not do a worst-case projection for South Korea (which could also be about 10 times off), Italy (which could be as much as 100 times off), or China (Mona didn’t include China in her graphic). I also didn’t represent the spike in China’s data that arose when they changed how they were testing, as it’s accounted for in the peak and subsequent data.
Here’s how I read this: China did an amazing job of managing the situation. South Korea had an awful spike, and somehow managed to turn it around. Italy — wow. Things are not good in Italy. Right now, we in the U.S. are doing okay, but it’s still very early, and it remains to be seen what our curve will look like. However, at least now I have some points of comparison.
Doing this exercise made me feel much better. Feedback (especially critiques and corrections) encouraged! Stay diligent, keep your (physical) distance, wash your hands, and take care of yourselves!