Batting Averages: Bayesian vs. MLE Estimates


It’s American baseball season, and I’ve been itching to do more baseball stats. So in the interest of explaining Bayesian posterior inference with an example…

Batting Averages

A player’s batting average is his number of “hits” divided by his number of “at bats“.

Binomial Model of Batting

For simplicitly, we’ll model a sequence of N at bats using a binomial distribution with a \theta probability of getting a hit in each at bat. This assumes that the sequence of at bats is independent and identically distributed, aka iid.

Of course this is wrong. All models are wrong. It’s just that this one’s more wrong than usual, despite the popularity of the batting average as a summary statistic. (See the second last section for a reference to a better model.)

Maximum Likelihood Estimate (MLE)

Given a sample (aka training data) consisting of a player with m hits in M at bats (where m > 0,  (M-m) > 0), the maximum-likelihood estimate (MLE) of the ability \theta is m/M, aka the player’s batting average.

Uniform Prior

We assume a uniform (aka noninformative) prior distribution over abilities.

This is also wrong. We know major league position players don’t bat under 0.200 or they’d be fired, and no one’s had a batting average of over 0.400 since Ted Williams.

But we’ll have to come back to issues with the prior later. This post is just to clarify what integration over the posterior does for inference, not what an informative prior does.

Bayesian Posterior Estimate

The beta prior is conjugate to the binomial. Because the uniform distribution may be expressed as a beta distribution, \mbox{\sf Beta}(1,1), the posterior is a beta distribution. Because the beta and binomial play nicely together, the posterior ability distribution p(\theta|m,M) may be expressed analytically as \mbox{\sf Beta}(1+m, 1+M-m). The maximum of \mbox{\sf Beta}(1+m,1+M-m) is m/M, so the maximum a posteriori (MAP) estimate of \theta is also m/M, the same as the maximum likelihood estimate.

Posterior Predictive Inference

Now let’s look at prediction. For concreteness, suppose we’ve observed M=10 at bats, of which m=3 were hits, for a batting average of 3/10 = 0.300. What we want to know is the likelihood of n hits in the next N=5 at bats.

There are six possible outcomes, 0, 1, 2, 3, 4, or 5 hits. Using the MLE or MAP point estimate, this distribution is \mbox{\sf Binomial}(n|5,0.300).

The Bayesian estimate requires integrating over the posterior, effectively averaging the binomial prediction for each ability by the ability’s probability. The posterior in this case is \mbox{\sf Beta}(3+1,10-3+1)=\mbox{\sf Beta}(4,8), so the integral results in the well known beta-binomial distribution:

\mbox{\sf BetaBinomial}(n|5, 4,8) = \int_0^1 \mbox{\sf Binomial}(n|5,\theta) \mbox{\sf Beta}(\theta|4,8) d\theta

I solved the integral numerically using Monte Carlo integration, which is a fancy way of saying I took a bunch of samples \theta_n from the beta posterior \mbox{\sf Beta}(4,8) and plugged them into \mbox{\sf Binomial}(5,\theta_n) and averaged the resulting predictions. It’s a few lines of R (see below). The beta-binomial may be expessed analytically, but that’s even more typing than the Monte Carlo solution. I’m sure there’s an R package that’d do it for me.

Comparing the Results

The important issue is to look at the outputs. Given that we’ve seen a batter get 3 hits in 10 at bats, here’s our prediction using a binomial model with uniform prior, shown with both the MLE/MAP point estimate and the average of the Bayesian predictions:

Point vs. Bayesian Estimates
Hits MLE/MAP Bayesian
0 0.168 0.180
1 0.360 0.302
2 0.309 0.275
3 0.132 0.165
4 0.028 0.064
5 0.002 0.012

The Bayesian estimate here is more dispersed (flatter, higher variance, higher entropy) than the MLE estimate. Of course, a true Bayesian estimate would provide a joint distribution over outcomes 0,1,2,3,4,5 (which reduce to a simplex, because any 5 values determine the 6th) for downstream reasoning. The nice thing about passing uncertainty along by integration is that it’s very plug-and-play.

Advanced Reading

Luckily for us baseball stats geeks, there’s some advanced reading hot off the press from Wharton Business School:

R Source

Here’s how I generated the table.

# observed
hits = 3
atBats = 10

# predicted
nextAtBats = 5

# y: MLE predictive distro
average = hits/atBats
y = rep(0,nextAtBats+1);
for (i in 0:nextAtBats) y[i+1] = dbinom(i,nextAtBats,average)

# z: Bayesian predictive distro
numSamps = 20000
for (k in 1:numSamps) {
    sampledAvg = rbeta(1,1+hits,1+atBats-hits)
    for (i in 0:nextAtBats)
        z[i+1] = z[i+1] + dbinom(i,nextAtBats,sampledAvg)
for (i in 0:nextAtBats)
    z[i+1] = z[i+1] / numSamps

print("Point"); print(y, digits=3)
print("Bayes"); print(z, digits=3)

2 Responses to “Batting Averages: Bayesian vs. MLE Estimates”

  1. staffbeethoven Says:

    Hi thank you for the post. But I got error running your code: Object z is not found.

    Also, why did you only want one sample, as the first input argument in the line:
    sampledAvg = rbeta(1,1+hits,1+atBats-hits)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s