October 2, 2015

Can We Improve Predictions? Q&A with Philip "Superforecasting" Tetlock

Social psychologist Philip Tetlock answers questions about his new book Superforecasting: The Art and Science of Prediction.

By John Horgan

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

I’ve been hard on social science, even suggesting that “social science” is an oxymoron. I noted, however, that social science has enormous potential, especially when it combines “rigorous empiricism with a resistance to absolute answers.”

The work of Philip Tetlock possesses these qualities, and it addresses a fundamental question: How predictable are social events? His early research, which assessed experts’ ability to foresee things like elections, economic collapses and wars, highlighted the difficulties of prediction. See, for example, how I cite him in a column on whether the public should defer to the judgment of scientific experts.

Tetlock’s new book Superforecasting: The Art and Science of Prediction, co-written with journalist Dan Gardner, is much more upbeat. The book has already received raves from The Economist, Wall Street Journal, former Treasury Secretary Robert Rubin, psychologist Steven Pinker, Nobel laureate Daniel Kahneman and others.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

I blurbed the book a few months ago. Tetlock, I wrote, “shows that certain people can forecast events with accuracy much better than chance—and so, perhaps, can the rest of us, if we emulate the critical thinking of these ‘superforecasters.’ The self-empowerment genre doesn’t get any smarter and more sophisticated than this.”

Tetlock, a social psychologist at the University of Pennsylvania, recently responded to my questions about his book and related topics.

Horgan: You’re renowned for showing in your 2005 book Expert Political Judgment how hard it is to predict social phenomena. And yet your new book is much more optimistic about the possibility of accurate prediction. Is there anything in your first book that you take back?

Tetlock: Nothing springs to mind. The contradictions are, in my view, more apparent than real. There are two big geopolitical forecasting-tournament data sets, one linked to Expert Political Judgment, summarizing tournaments that ran from 1985 – 2002, and the other linked to GJP (Good Judgment Project), otherwise known as the IARPA (Intelligence Advanced Research Projects Agency) tournament, which ran from 2011 – 2015.

There are, of course, important similarities. Both tournaments pose questions about possible futures well specified enough to pass the clairvoyance test. And they ask forecasters to make judgments along probability scales.

But there are big differences--and these differences account for the different findings and emphases in interpretation. (Was it Heisenberg who said: “We know nature only as it is exposed to our methods of questioning”? Regardless, that truism is certainly true of forecasting tournaments.)

The cumulative effect of all these differences was that there were more opportunities and incentives for forecasters to shine in the later work than in the earlier work. Consider this list of differences:

(1) the shortest questions in the earlier work (asking people to look out about one year) were shorter than all but the very longest questions in the later work (the vast majority of questions that superforecasters required looking out several months but less than a year);

(2) forecasters in the earlier work wanted anonymity whereas forecasters in the later work wanted to be recognized on leaderboards;

(3) forecasters in the earlier work rarely had opportunities to update their beliefs whereas forecasters in the later work were strongly encouraged to update their probability estimates as often as they felt the news warranted.

Put differently, the much more publicly competitive nature of the IARPA tournaments pressures people to be more open minded, to be foxier, than they normally are (more so than do EPJ tournaments)--because they raise the reputational risks of closed-mindedness.

I suppose that is why people who have read both Expert Political Judgment and Superforecasting see the latter book as more upbeat, more about lighting candles than cursing the darkness. That is probably a pretty fair assessment. Deep down, I see the two books as complementary, not contradictory.

Horgan: You have discovered that certain people possess traits that make them “superforecasters,” who are much better than average at predicting social events. Can these traits be automated, that is, be codified in algorithms?

Tetlock: We describe in the book an opportunity to discuss this problem with David Ferrucci, the creator of WATSON (the artificial-intelligence world-champion in Jeopardy). He agreed, for instance, that WATSON would have little difficulty answering a question like: which two Russian leaders traded jobs in the last five years? But he noted it would be quite another matter to answer the question: will those same Russian leaders change jobs in the next five years? The second question is one that superforecasters would find pretty easy (I think) but that no artificial-intelligence system on the planet today could field in a compelling way. Why is the second question so much more difficult than the first? Because answering the second question requires a somewhat intricate causal model of the Russian political system, of the personalities involved, and of the evolving threats and opportunities they are likely to confront. It is not "just" a matter of scanning a massive database and triangulating in on the most plausible Bayesian-estimated answer. I put scare quotes around "just" because I do not in any way want to trivialize what an extraordinary achievement WATSON is.

Horgan: Are you a believer in the power of Big Data to revolutionize the social sciences? Will social science ever be as precise and rigorous as physics?

Tetlock: I'm not sure about "revolutionizing" social science, but Big Data will clearly make it possible to answer many categories of questions that were previously unanswerable. We now have massive databases on interpersonal relations (e.g. Facebook), search behavior (Google), consumer behavior (seemingly everywhere). Tangentially: Companies routinely do things to all of us that the human subjects review boards at universities would categorize as unconscionably unethical. Either university review boards are ridiculously hypersensitive or Big Data firms are ridiculously insensitive. I think it is a mix.

Horgan: Social theories and predictions can have an enormous impact on societies, as Marx’s impact on history demonstrates. Does this feedback factor contribute to the difficulty of social prediction? Is it possible to build models that take this factor into account?

Tetlock: I agree that self-fulfilling and self-negating prophecies do indeed "contribute to the difficulty of social prediction." These effects are difficult to measure and model but not always impossible. For instance, many of the questions asked in the most recent forecasting tournaments were conditional forecasts of the form: if the U.S. government (or another entity) does X or Y, how likely is this outcome Z? Of course, it will only be possible to evaluate the empirical accuracy of forecasts along one branch of the conditional (the option that the decision-making entity embraces). The other branch becomes part of counterfactual history (we never get a chance to observe what would have happened if we had gone down that other path).

One could argue that forecasting tournaments do, however, shed some indirect light even on the accuracy of judgments about counterfactual history. After all, whose judgments about what would've happened do you trust more: those who were accurate in the actual world or those were inaccurate?

Some readers might wonder why we should care about trying to construct indirect gauges of who is more likely to be correct in their judgments of counterfactual worlds. It turns out, though, that the assumptions we make about what would've happened in these counterfactual worlds underlie all causal lessons we draw from history. If you believe the Iraq 2003 war was a mistake, that means you believe that things would have worked out better in the counterfactual worlds in which the U.S. did not launch that invasion--and Saddam Hussein might still be in power. Don't forget: even if your counterfactual belief is widely shared, it is still a counterfactual belief, not a factual one.

Horgan:Surveys I’ve been carrying out for a dozen years show that about nine in ten Americans believe war will never be eradicated. I fear that this pessimistic belief will be self-fulfilling. Can you comment on this specific possibility and on the more general problem of self-fulfilling prophecies?

Tetlock: Too big a question for my taste, but I will hazard a few observations. The classic definition of a "state" is an organization that claims a monopoly on the use of force in a given territory. As long as the world is divided into competitive nation-states, each of which claims to be a law unto itself, and as long as the international system is "anarchic" (no world government with effective enforcement powers), there will be potential for war. But the optimist in me is heartened by how circumspect nuclear-armed states have been about even threatening to use nuclear weapons (even North Korea's bark seems to be much worse than its bite, so far). And it is interesting how rarely well-established democracies fight each other.

So I suppose this is a rather long-winded way of saying: I don't know and I don't think anyone on the planet does.

Horgan: The research you describe in Superforecasting was funded by the Department of Defense. Did you have any qualms about accepting military money? Are you concerned, more generally, about the dependence of American researchers on military funding?

Tetlock: IARPA placed no constraints on our ability to publish-- and no classified information was involved. In these senses, we had as much freedom as we would have if we had been supported by the National Science Foundation. (IARPA is, incidentally, part of U.S. intelligence community--not part of the military. The larger question obviously still stands).

I have a hard time imagining the National Science Foundation deciding to support something as deeply interdisciplinary as forecasting tournaments (which cross the boundaries of several sections of NSF: judgment and decision-making, social and individual-difference psychology, statistics, economics, political science).

My view is that forecasting tournaments deepen our understanding of how to generate realistic probability estimates-- and, thus reduce the likelihood of calamitous intelligence errors of the sort that led to the 2003 Iraq war (where the intelligence community was egregiously overconfident in its assessment of the likelihood of finding active programs to produce weapons of mass destruction in Iraq-- most vividly captured in the famous slamdunk quote). Insofar as our research reduces the likelihood of such errors in the future, it handily passes my cost-benefit test.

Horgan:Do you believe in free will? Why or why not? Does your belief or disbelief have any impact on your science?

Tetlock: This question is even further beyond my pay grade. If free will is an illusion (and there are good grounds for hypothesizing this), then it is a damn convincing one--and one that serves critical functions in the existing social order (an essential underpinning of moral responsibility and accountability).

Horgan: Psychology and the social sciences have taken a beating lately, as many well-publicized claims have turned out to be exaggerated or false. What can these fields do to restore their reputations?

Tetlock: Forecasting tournaments are radically transparent: the funding agency collected all of the data submissions at 9 AM EST each day the tournaments were running. There was no room for fudging--for claiming that your probability estimates were really more accurate than portrayed. So I do recommend this model of inquiry.

More generally, I think that the replication efforts of the Open Science project are a good step in the direction of reputation restoration. I should also note that I was a co-author of an article that appeared in Behavioral and Brain Sciences last month that makes the case for greater ideological diversity in social psychology and social science (a checks-and-balances argument). But this was a problem that has been building up for a long time and it will take a long time to clean things up.

Horgan: Do you have any advice for the legions of researchers and officials who are trying to predict the effects of fossil-fuel consumption on human well-being?

Tetlock: Humility.

Horgan: Would you describe yourself as an optimist or pessimist about the prospects for humanity?

Tetlock: I suppose I would use the term used in Superforecasting: a cautious optimist.

Addendum: Tetlock, is visiting my school, Stevens Institute of Technology, Hoboken, N.J., to give a talk on October 14, 5 p.m. It is free and open to the public.