I was commenting on a Wall St. Journal blog post on NCAA bracket math and figured I could actually elaborate on the math here.
For those of you who don’t know what it is, “March Madness” is a tournament of college basketball teams in the United States. The same approach could be used for the World Cup, chess tournaments, baseball seasons, etc. Anything where bettors assign strengths to teams then some subset of those teams play each other. It can be round robin, single elimination, or anything in between.
The Problem: A Betting Pool
Now suppose we want to run an office betting pool [just for fun, of course, as we don’t want to get in trouble with the law]. We want everyone to express some preferences about which teams are better and then evaluate who made the best predictions.
The contest will consist of 63 games total in a single-elimination tournament. (First round is 32 games, next 16, next 8, then 4, 2, and 1, totalling 63.) The big problem is that who plays in the second round depends on who wins in the first round. With 64 teams, there are possible matchups, a few too many to enumerate.
You could do something like have everyone rank the teams and then somehow try to line those up. See the WSJ blog post for more ad hoc suggestions.
The Bradley-Terry Model
The Bradley-Terry model is a model for predicting the outcome of pairwise comparisons. Suppose there are items (teams, in this case) being compared. Each item gets a coefficient
indicating how strong the team is, with larger numbers being better. The model then assigns the following probability to a matchup:
The inverse logit is . For instance, if the team
and team
have the same strength, that is
, the the probability is even. If
, then the probability approaches 1 that team
will defeat team
.
As an aside, there have been all kinds of adjustments to this model. For instance, you can add an intercept term for home team advantage, and perhaps have this vary by team. You can imagine adding all sorts of other random effects for games. We can also add a third outcome for ties if the sport allows it.
Scoring Predictions
Suppose we have each bettor assign a number to each team
. This vector
of team strength coefficients determines the probability that team
defeats team
for any
.
Suppose the games are numbered 1 to 63 and that in game , the result is that team
defeated team
. Then the score assigned to ratings
of team strengths is:
.
Higher scores are better. In words, what happens is that for each game, you get a score that’s the log of the probabilty you predicted for winning for the team that won. The total score is just the sum of these individual game scores, so it’s the total probability that your rankings assigned to what actually happened.
For instance, let’s suppose there are three teams playing each other round robin. Let’s suppose that team 1 beats team 2, team 1 beats team 3 and team 3 beats team 2. Now suppose we assigned strengths to team 1,
to team 2 and
to team 3. The total score would be
.
This corresponds to an probability assigned by the coefficients
to the outcomes of the three games.
Breaking out the probabilities for the individual games, note that
,
, and
.
Note that multiplying these probabilities together yields 0.20.
Fitting the Model Given Data
Given the outcomes, it’s easy to optimize the coefficients. It’s just the maximum likelihood estimate for the Bradley-Terry model! Of course, we can compute Bayesian posteriors and go the full Bayesian inference route (Gelman et al.’s Bayesian Data Analysis goes over the Chess case where there may be draws.)
An obvious way to do this would be to use the season’s games to fit the coefficients. You could add in all the other teams, too, which provide information on team strengths, then just use the coefficients for the teams in the tournament for prediction. A multilevel model would make sense in this setting, of course, to smooth the strengths. The pooling could be overall, by division, or whatever else made sense for the problem.
How to Explain it to the Punters?
No idea. You could have them assign numbers and then they could explore what the predictions are for any pair of teams.