Skip to main content

Leading with the Unknowns in COVID-19 Models

In times of great uncertainty, we should look beyond the data

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


As the U.S. tops the chart on COVID-19 cases and growth rate, the theme of regret is ubiquitous in the media. Lost time that could have been spent enacting more stringent distancing measures weighs on the minds of many leaders and citizens. As a researcher in uncertainty visualization, I fear a different sort of regret from our response to COVID-19.

Many visualizations, including variations on the widely distributed Flatten the Curve graphrepresent estimates produced by models. These models simulate the number of people who would be infected, require hospitalization, or die under different conditions. Flatten the Curve adapts a visualization first presented by the CDC in 2007to compare such estimates under different levels and durations of social distancing. The author added a dotted line to represent his estimate of the number of available hospital beds in the country.

It is easy to perceive the predictions of cases and deaths from simulations as complete depictions of what we can expect based on what we know. For one thing, these models take in multiple streams of available data: on COVID-19 cases, deaths, and rates of hospitalization; on how quickly COVID-19 spread under different conditions elsewhere in the world; and on how related viruses have spread in the past, to name a few.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Model results are powerful, because from them we can calculate risks. For example, How much more likely is it that our death rate reaches 10 percent of cases, as it has in Italy, under distancing measures? How likely is it that the virus will peak in two to three weeks? By quantifying unknowns, estimates of risks make clear that what will happen is not completely certain, but can nonetheless empower us to make decisions and weigh trade-offs.

What worries me as an expert in reasoning under uncertainty is a more difficult type of uncertainty: the uncertainty that arises from the many unknowns underlying COVID-19 data and models. We can’t easily quantify this uncertainty, and it is easy to overlook, since it is not conveyed by model estimates alone.

One form of unquantifiable uncertainty stems from our limited ability to estimate how accurate the data that is input to these models is.Available data on COVID-19 case counts are likely to be unreliableas a result of large differences in the scale of testing in different locations, combined with inconsistencies in how testing is applied in a single location. This leads to case number comparisons of apples to oranges. A larger number of cases in one place, or even a higher rate of cases per capita, does not necessarily equate to a higher risk. More likely, it means health providers are testing more broadly in that location. Until we implement more comprehensive, nonselective testing, we can’t quantify exactly how at risk of bias these data are.

Data on deaths from COVID-19 are likely to be more reliable, but may still be far from perfect. For example, it may be difficult to trace whether COVID-19 or another preexisting condition caused death in the elderly. Community decision-makers may also be incentivized to underreport deaths to avoid spreading panic or crippling a local economy.

A second form of unquantifiable uncertainty stems from the fact that models are often gross simplifications of real world situations. Many of the models being used to forecast our future under COVID-19 make strong assumptions that seem contradicted by what we expect in reality. Models vary in the assumptions they make about the mechanism behind disease transmission. Some approaches focus on fitting curves to available data rather than assuming mechanisms that account for realities like incubation periods and immunity after infection.

Others account for these dynamics, but make strong assumptions about the predictability of human behavior in the face of a crisis. Sometimes called ambiguity, non-numerical uncertainty like the unquantifiable inexactness of a model as a stand-in for reality means that our predictions could be off, by a little or by a lot depending on how flawed the model assumptions were. “All models are wrong, but some are useful

,” said George Box, a statistician, reminding us of the tension between understanding models as tools for thinking versus expecting models to be oracles. Unfortunately, a careful critique of model assumptions, like other forthright presentations of uncertainty,, rarely makes it into the public-facing articles or visualizations used to present the results.

It is especially easy to overlook the strength of the assumptions models make because their predictions can seem comprehensive. Rather than producing a single number like a count, a model typically produces a set of predicted outcomes. Flatten the Curve, for example, shows two areas representing case counts over time: if we enact protective measures, and if we don’t. A predicted number of infections is shown for each day after the first confirmed case.

Even when quantifiable uncertainty associated with the model predictions is not shown—in this case, we do not see other values that the predicted case counts by day could take under the model assumptions--visualizations like Flatten the Curve can imply completeness through the series of predictions they produce. For many, seeing a graphical depiction of distributions of possibilities over time or space may seem like the epitome of scientific carefulness. Behind the seemingly precise visuals, however, are a number of approximations.

Does the presence of uncertainty make the extreme social distancing measures being enacted in many states an overreaction? Not necessarily. In the absence of good estimates of risk, it is rational to guard against worst case outcomes.It’s the best we can do, until we get better data.

What is dangerous is if we fail to recognize the difference between model predictions made now based on limited information and strong assumptions, and more reliable data that will emerge over time as the virus plays out. If early model predictions turn out to overestimate COVID-19 deaths or the risk to our health system, or underestimate it by a significant amount, many may blame the scientists for being wrong. They may trust data-driven estimates less in the future.

Clear presentation of uncertainty can make model estimates seem less reassuring, but can prevent people from blaming the forecasteror the scientific enterprise itself, when, as we should expect, the model is wrong. Trading public trust in science in the future is not worth feeling more assured in the short term, no matter how much we seek to eliminate uncertainty.