ADVERTISEMENT
  About the SA Blog Network













Observations

Observations


Opinion, arguments & analyses from the editors of Scientific American
Observations HomeAboutContact

Statistician Creates Alternate Model for College Football Rankings

The views expressed are those of the author and are not necessarily those of Scientific American.


Email   PrintPrint



A view from the Louisiana State-Alabama college football game, November 3, 2012. Credit: flickr/thepipe26

The Bowl Championship Series (BCS) college football rankings are in turmoil. For two weeks in a row, the top-ranked team has been upset by an underdog from central Texas. (Full disclosure: As a Baylor alum who is the daughter and granddaughter of Aggies, I might be just a little smug.) The BCS rankings are a pretty big deal: college football doesn’t have a championship tournament (yet), and the rankings determine which teams play in the biggest bowl games, including the national championship game between the number one and number two teams. Determining the rankings of the 120 college teams in the Football Bowl Subdivision (FBS) is not easy, especially because teams only play about twelve games, meaning they don’t play 90 percent of the field.

Currently, the BCS rankings are constructed from three components: the Harris interactive poll and the Coaches poll—both of which are surveys of various college football experts—and an average of six different computer ranking systems. Those ingredients are stirred up with some weighted averaging and outlier disposal to create the BCS rankings we know and love (to hate).

The rankings, especially the computer models, can be mysterious. What factors go in, and can we trust the numbers that come out? What are some of the pitfalls that rankings creators need to avoid? Last year, Andrew Karl, a newly minted Ph.D. statistician from Arizona State University, asked himself these questions and decided to create his own ranking system using the BCS rules for computer models in order to understand the process better. The paper he wrote about his own model for college football rankings and the effects of different choices on the outcome was then published in the Journal of Quantitative Analysis in Sports (preprint here).

Karl says that the biggest weakness of the BCS rankings system is that it only allows computer models to take into account a team’s binary win-loss record, not the margin of victory, presumably to decrease the chance that teams will run up the score when playing a much weaker opponent. “From a stats perspective, it’s a more challenging notion than using, for example, margin of victory,” says Karl. “There’s a nonlinear component to it.” (Nonlinear is math-ese for “impossible to solve.”) One problem is that when a team is undefeated, many models that use only win-loss data will assign them infinite odds of winning. Oops.

Karl’s approach uses a “generalized linear mixed model.” Wins and losses over the course of the season create a rating for each team. A bit confusingly, a rating is not the same as a ranking. The ordering of the ratings determines the ranking, but the rating contains more information. Teams that differ by one in rank could have ratings that are very close or very far apart. In that case, the difference in ratings might be within the margin of error, but the ranking will not reflect that. If the model is perfect and the world contains no surprises, a team with a higher rating will always beat a team with a lower rating. The term “generalized” means that the ratings might not follow a normal distribution, the famous bell curve, and the term “mixed” means that the model includes some random effects, since the real world throws in a few curveballs. Unfortunately, the model ends up spitting out a pretty scary calculation: an integral whose dimension is the same as the number of teams in the rankings, which this year is 120, if you restrict to bowl-eligible teams. (If you took multi-variable calculus, remember how hard 3-dimensional integrals can be? Adding 117 more dimensions doesn’t make things easier.) Karl can’t say for sure how his model differs from the computer models used by the BCS, as most of them are proprietary.

Karl actually developed several different models, each varying the distribution of the ratings, the amount and type of random variation, the chosen method of approximating the solution to the integral, and even whether to rank only the FBS teams or all the teams in Division 1, including teams in smaller conferences such as the Ivy and Big Sky leagues. Karl found that in general, the changes due to these choices were small compared with standard errors in ratings, but of course even a small difference in ratings can change the rankings of two teams, possibly affecting who gets to play for the championship.

If the computer models are so problematic, why use them at all? Karl says that this year, Florida State (FSU) highlights the utility of the computer rankings. FSU has only one loss, but it has played an easy schedule. The human polls have FSU ranked five and six, but the computer rankings—both the BCS’s and Karl’s—have it down around 15; the computer models are more sensitive to the strength of schedule, and less emotional about the win-loss record, than the humans surveyed.

Karl says he’s not a betting man, but he sometimes does informal picks with his friends. “None of my other friends have their own models,” he says, but he acknowledges that his model is probably not one of the best ones—he was working within the constraints of the BCS computer ranking system rules. If he were really out to win big, he would use a model that takes margin of victory into account.

Right now, Karl’s rankings aren’t too far off from those of the BCS. The BCS top five teams are (in order) Notre Dame, Alabama, Georgia, Florida and Oregon. Karl’s best model has Notre Dame, Florida, Alabama, Oregon and Stanford at the top. They aren’t too different, but even the small changes would make a big difference to the teams who would end up in different bowl games as a result. With only one undefeated team left, who knows what might happen in the final weeks? (Sorry, Fighting Irish fans, but I like the chaos: Go USC!)

Evelyn Lamb About the Author: Evelyn Lamb is a postdoc at the University of Utah. She writes about mathematics and other cool stuff. Follow on Twitter @evelynjlamb.

The views expressed are those of the author and are not necessarily those of Scientific American.





Rights & Permissions

Comments 5 Comments

Add Comment
  1. 1. dubay.denis 12:23 pm 11/22/2012

    Nice story. Go Irish!

    Link to this
  2. 2. CherryBombSim 8:11 pm 11/24/2012

    I had no idea that the computer rankings had been nefred so bad before I read this. No margin of victory allowed, so obviously you can’t use anything that correlates strongly with it, either. Stats that might tell you something like, well, how good a team is. Handicapping games based only on won-lost records is nuts; no matter how good your model is, you simply don’t have enough data points during a season to make good predictions.

    Link to this
  3. 3. denisosu 3:56 pm 11/27/2012

    Margin of victory is an interesting one. You’d think that you want the model to find the best team. But to take an extreme case, a 45 yard field goal on a windy day that’s the last kick of the game with the kicking team down 21-19.
    If you just want to find the best team, then this final kick doesn’t tell you much. It’s a 50/50 kick, it probably tells very little about the kicking team’s strength, even about the kicker – the best kickers might make that 6 times out of 10, the average kickers 5 out of 10. It certainly tells you nothing about how good the defending team is.
    So a good computer model to find the best team will look at the score, whether it be 21-19 or 22-21 and say “two pretty even teams, anyone could have won, we’ll rate them about equally”. And in terms of finding the best team, that would be quite accurate.
    But who wants that? Who wants games where the last minute winning kick or hail-mary touchdown’s aren’t important? It would take the fun out of it.
    By focusing on W/L record, you create a system where every game is exciting, it matters if you win of lose much more than it matters how well you play. So even your underdog team can hope for some lucky wins and a high ranking, above teams that are “better”.
    The real problem with the rankings is that the top teams do not play enough tough games. A schedule that was more flexible, for example, that enabled the teams in the top 20 after 6 games to play at least their last 3 games against other top 20 teams, would give the computers and the human judges a lot more data to work with.

    Link to this
  4. 4. patrickh74 1:22 pm 11/28/2012

    The only undefeated team?! Ohio State is undefeated and none of the players or coaches who caused the sanctions are still at the school. 11-0 Notre Dame gets CRUSHED by a way better offense and OSU is every bit a good on defense. Wait till next year. Almost the whole team, from top to bottom, is returning. I see 12 wins and a BCS championship approximately 12 months from now. Right it down. It came from here first (all you haters). And for the record, I am a Michigan fan who got to see first hand how good the Buckeyes are AND will be again next year (maybe 2).

    Link to this
  5. 5. patrickh74 1:23 pm 11/28/2012

    Write not right. Sorry

    Link to this

Add a Comment
You must sign in or register as a ScientificAmerican.com member to submit a comment.

More from Scientific American

Scientific American Holiday Sale

Black Friday/Cyber Monday Blow-Out Sale

Enter code:
HOLIDAY 2014
at checkout

Get 20% off now! >

X

Email this Article

X