Opinion, arguments & analyses from the editors of Scientific American

# World Cup Prediction Mathematics Explained

The views expressed are those of the author and are not necessarily those of Scientific American.

Brazil vs. England in a "friendly" in Rio de Janeiro

The World Cup is back, and everyone’s got a pick for the winner. Gamblers have been predicting the outcome of sporting contests since the first foot race across the savannah, but in recent years a unique type of statistical analysis has taken over the prediction business. Everyone from Goldman Sachs to Bloomberg to Nate Silver’s FiveThirtyEight has an online World Cup predictor that uses numbers, not hunches, to generate precise probabilities for match outcomes. Goldman Sachs, for instance, gives host nation Brazil a 48.5 percent chance of winning it all; FiveThirtyEight puts the odds at 45 percent while Bloomberg Sports has concluded there’s just a 19.9 percent chance of a triumph for the Seleção.

Where do these numbers come from? All statistical analysis must start with data, and these soccer prediction engines skim results from former matches. A fair bit of judgment is necessary here. Big international soccer tournaments only come around every so often, so the analysts have to choose how to weight team performance in lesser events such as international “friendlies,” where nothing of consequence is at stake. The modelers also have to decide how far back to pull data from—does Brazil’s proud soccer history matter much when its oldest player is 34?—and how to rate the performance of individual players during their time playing for club teams such as Manchester United or Real Madrid.

Wherever the data comes from, the modeler now has to incorporate it into a model. Frequently, the modeler translates the question of “who is going to win?” into the form “how many goals will team X score against team Y?” And for this, she relies [PDF] on a statistical tool called a bivariate Poisson regression.

Those are three unfamiliar words. Let’s unpack them one-by-one. “Bivariate” means there’s two inter-related variables for which we are trying to predict a single outcome—team’s X performance against team Y. “Regression” just means that we’re fitting a set of data to a model. “Poisson” is the interesting one.

Imagine that you’re standing by the side of the road and you want to know how many cars go by in a minute. First, you’d take some data. Armed with a stopwatch and a counter, you’d see that 15 go by one minute, 18 the next, just four the third minute. Do this for enough minutes and you’d begin to see a pattern build up, a Poisson distribution, named for the French mathematician who invented it in order to estimate the frequency of false convictions.

Poisson distributions with a mean of 1, 4 and 10

The number of goals in a game also tend to be distributed according to the Poisson distribution. A given team may be most likely to score one or two goals, sometimes zero or three, and much less frequently four or five (or more). Modelers will map the data from a team’s previous performance onto a Poisson distribution of the number of goals they are likely to score against their opponent.

And the gamblers? As of this writing the online sportsbook Betfair has Brazil as a 3-to-1 favorite, or 24.4 percent. If you believe the analysts at Goldman Sachs or FiveThirtyEight, who have Brazil at nearly a 50 percent favorite, a betting opportunity has opened up for you. Of course, presumably all those people betting on Brazil at 3-to-1 odds have also read the Goldman Sachs and FiveThirtyEight analysis.

The question becomes: What do they know that the statisticians don’t?

Image by Digo Souza on Flickr

About the Author: Michael Moyer is the editor in charge of space and physics coverage at Scientific American. Follow on Twitter @mmoyr.

The views expressed are those of the author and are not necessarily those of Scientific American.

 Previous: Don’t Go in the Water: The Chemistry of Pee in the Pool [Video] MoreObservations Next: How the Body’s Cells Hold on Tight

Rights & Permissions

1. 1. RobFromLoveland 4:38 pm 06/11/2014

A fifty percent favorite would seem to be high. Nate Silver has a great track record. As far as Goldman Sachs, we only need look to the last recession to see evidence of their predictive skills, math or otherwise. There are 32 teams, and many, including the US, can be discounted. But there would seem to be at least a handful of teams with strong enough personnel to win the event. Brazil’s past three World Cups won’t mean much, but home crowds and friendly cities will play quite a role. The host country has won 6 (of 19) but only once since 1978 (France in 1998).
It’s sketchy to put much reliance on distribution of goals scored. The national teams have played relatively few games, and in these the coaches were putting more emphasis on building a functioning team than on wins. Heavy favorites tend not to be good bets (recall the Belmont Stakes). At even money, I’d take the field against Brazil.

2. 2. jacktheho 10:58 pm 06/11/2014

Ya it will be interesting to see who wins this moro, but Im finna keep tracking my bets for World Cup on http://www.betforsomething.com in order 2 collect my winnings

3. 3. TonyFr 4:23 am 06/12/2014

48.5% for Brazil seems indeed quite high. An entirely different approach is this world cup stock market game at http://prediction.zone/stockmarket/worldcup2014/info

4. 4. 13inches 1:24 pm 06/13/2014

Can anyone explain mathematically why anyone would even WATCH a game as dumb as soccer ? Three hours of grown men running around in the middle of a big lawn tripping each other and the final score always ends in a zero to zero tie ?

5. 5. llirbo 8:50 pm 06/13/2014

13inches? That’s pretty optimistic, I’d say.

Football is not nearly as dumb as American Football, or worse yet Baseball.

6. 6. OpenToLearning 10:59 pm 06/13/2014

What the gamblers “know” that the statiticians don’t is that the house always wins. And to win, the house merely sets its odds based on the wagerers’ preferences: the house may “know” that Brazil is pretty much an even favorite, but as long as people are willing to bet that the hosts have only a one in four chance, that’s how they set the odds, plus the all-important vigorish. If more people start picking Brazil, the payoff for a correct choice will go down in an inverse proportion. The studies you quote are predictors of results over many trials; the gambling odds are merely reflective of public opinion. Two different things, as I should hope a scientist would understand by now!

7. 7. jhlarizzatti 8:10 am 06/14/2014

Good day, Evevyone.
A 100% of the times Brazil played a final match at home we lost. If you think in a geostatistics way, where geographic position matters on the significance of the data (and I think it do matters), modelling soccer games is not so easy.

Good luck to all!

8. 8. markmitry 9:59 pm 06/17/2014

by the same reasons we can not explain your silly comment.