Evelyn Lamb writes about mathematics and other cool stuff. Follow on Twitter
A view from the Louisiana State-Alabama college football game, November 3, 2012. Credit: flickr/thepipe26
The Bowl Championship Series (BCS) college football rankings are in turmoil. For two weeks in a row, the top-ranked team has been upset by an underdog from central Texas. (Full disclosure: As a Baylor alum who is the daughter and granddaughter of Aggies, I might be just a little smug.) The BCS rankings are a pretty big deal: college football doesn’t have a championship tournament (yet), and the rankings determine which teams play in the biggest bowl games, including the national championship game between the number one and number two teams. Determining the rankings of the 120 college teams in the Football Bowl Subdivision (FBS) is not easy, especially because teams only play about twelve games, meaning they don’t play 90 percent of the field.
Currently, the BCS rankings are constructed from three components: the Harris interactive poll and the Coaches poll—both of which are surveys of various college football experts—and an average of six different computer ranking systems. Those ingredients are stirred up with some weighted averaging and outlier disposal to create the BCS rankings we know and love (to hate).
The rankings, especially the computer models, can be mysterious. What factors go in, and can we trust the numbers that come out? What are some of the pitfalls that rankings creators need to avoid? Last year, Andrew Karl, a newly minted Ph.D. statistician from Arizona State University, asked himself these questions and decided to create his own ranking system using the BCS rules for computer models in order to understand the process better. The paper he wrote about his own model for college football rankings and the effects of different choices on the outcome was then published in the Journal of Quantitative Analysis in Sports (preprint here).
Karl says that the biggest weakness of the BCS rankings system is that it only allows computer models to take into account a team’s binary win-loss record, not the margin of victory, presumably to decrease the chance that teams will run up the score when playing a much weaker opponent. “From a stats perspective, it’s a more challenging notion than using, for example, margin of victory,” says Karl. “There’s a nonlinear component to it.” (Nonlinear is math-ese for “impossible to solve.”) One problem is that when a team is undefeated, many models that use only win-loss data will assign them infinite odds of winning. Oops.
Karl’s approach uses a “generalized linear mixed model.” Wins and losses over the course of the season create a rating for each team. A bit confusingly, a rating is not the same as a ranking. The ordering of the ratings determines the ranking, but the rating contains more information. Teams that differ by one in rank could have ratings that are very close or very far apart. In that case, the difference in ratings might be within the margin of error, but the ranking will not reflect that. If the model is perfect and the world contains no surprises, a team with a higher rating will always beat a team with a lower rating. The term “generalized” means that the ratings might not follow a normal distribution, the famous bell curve, and the term “mixed” means that the model includes some random effects, since the real world throws in a few curveballs. Unfortunately, the model ends up spitting out a pretty scary calculation: an integral whose dimension is the same as the number of teams in the rankings, which this year is 120, if you restrict to bowl-eligible teams. (If you took multi-variable calculus, remember how hard 3-dimensional integrals can be? Adding 117 more dimensions doesn’t make things easier.) Karl can’t say for sure how his model differs from the computer models used by the BCS, as most of them are proprietary.
Karl actually developed several different models, each varying the distribution of the ratings, the amount and type of random variation, the chosen method of approximating the solution to the integral, and even whether to rank only the FBS teams or all the teams in Division 1, including teams in smaller conferences such as the Ivy and Big Sky leagues. Karl found that in general, the changes due to these choices were small compared with standard errors in ratings, but of course even a small difference in ratings can change the rankings of two teams, possibly affecting who gets to play for the championship.
If the computer models are so problematic, why use them at all? Karl says that this year, Florida State (FSU) highlights the utility of the computer rankings. FSU has only one loss, but it has played an easy schedule. The human polls have FSU ranked five and six, but the computer rankings—both the BCS’s and Karl’s—have it down around 15; the computer models are more sensitive to the strength of schedule, and less emotional about the win-loss record, than the humans surveyed.
Karl says he’s not a betting man, but he sometimes does informal picks with his friends. “None of my other friends have their own models,” he says, but he acknowledges that his model is probably not one of the best ones—he was working within the constraints of the BCS computer ranking system rules. If he were really out to win big, he would use a model that takes margin of victory into account.
Right now, Karl’s rankings aren’t too far off from those of the BCS. The BCS top five teams are (in order) Notre Dame, Alabama, Georgia, Florida and Oregon. Karl’s best model has Notre Dame, Florida, Alabama, Oregon and Stanford at the top. They aren’t too different, but even the small changes would make a big difference to the teams who would end up in different bowl games as a result. With only one undefeated team left, who knows what might happen in the final weeks? (Sorry, Fighting Irish fans, but I like the chaos: Go USC!)