## Batting Averages: Bayesian vs. MLE Estimates

It’s American baseball season, and I’ve been itching to do more baseball stats. So in the interest of explaining Bayesian posterior inference with an example…

### Batting Averages

A player’s batting average is his number of “hits” divided by his number of “at bats“.

### Binomial Model of Batting

For simplicitly, we’ll model a sequence of $N$ at bats using a binomial distribution with a $\theta$ probability of getting a hit in each at bat. This assumes that the sequence of at bats is independent and identically distributed, aka iid.

Of course this is wrong. All models are wrong. It’s just that this one’s more wrong than usual, despite the popularity of the batting average as a summary statistic. (See the second last section for a reference to a better model.)

### Maximum Likelihood Estimate (MLE)

Given a sample (aka training data) consisting of a player with $m$ hits in $M$ at bats (where $m > 0, (M-m) > 0$), the maximum-likelihood estimate (MLE) of the ability $\theta$ is $m/M$, aka the player’s batting average.

### Uniform Prior

We assume a uniform (aka noninformative) prior distribution over abilities.

This is also wrong. We know major league position players don’t bat under 0.200 or they’d be fired, and no one’s had a batting average of over 0.400 since Ted Williams.

But we’ll have to come back to issues with the prior later. This post is just to clarify what integration over the posterior does for inference, not what an informative prior does.

### Bayesian Posterior Estimate

The beta prior is conjugate to the binomial. Because the uniform distribution may be expressed as a beta distribution, $\mbox{\sf Beta}(1,1)$, the posterior is a beta distribution. Because the beta and binomial play nicely together, the posterior ability distribution $p(\theta|m,M)$ may be expressed analytically as $\mbox{\sf Beta}(1+m, 1+M-m)$. The maximum of $\mbox{\sf Beta}(1+m,1+M-m)$ is $m/M$, so the maximum a posteriori (MAP) estimate of $\theta$ is also $m/M$, the same as the maximum likelihood estimate.

### Posterior Predictive Inference

Now let’s look at prediction. For concreteness, suppose we’ve observed M=10 at bats, of which m=3 were hits, for a batting average of 3/10 = 0.300. What we want to know is the likelihood of n hits in the next N=5 at bats.

There are six possible outcomes, 0, 1, 2, 3, 4, or 5 hits. Using the MLE or MAP point estimate, this distribution is $\mbox{\sf Binomial}(n|5,0.300)$.

The Bayesian estimate requires integrating over the posterior, effectively averaging the binomial prediction for each ability by the ability’s probability. The posterior in this case is $\mbox{\sf Beta}(3+1,10-3+1)=\mbox{\sf Beta}(4,8)$, so the integral results in the well known beta-binomial distribution:

$\mbox{\sf BetaBinomial}(n|5, 4,8) = \int_0^1 \mbox{\sf Binomial}(n|5,\theta) \mbox{\sf Beta}(\theta|4,8) d\theta$

I solved the integral numerically using Monte Carlo integration, which is a fancy way of saying I took a bunch of samples $\theta_n$ from the beta posterior $\mbox{\sf Beta}(4,8)$ and plugged them into $\mbox{\sf Binomial}(5,\theta_n)$ and averaged the resulting predictions. It’s a few lines of R (see below). The beta-binomial may be expessed analytically, but that’s even more typing than the Monte Carlo solution. I’m sure there’s an R package that’d do it for me.

### Comparing the Results

The important issue is to look at the outputs. Given that we’ve seen a batter get 3 hits in 10 at bats, here’s our prediction using a binomial model with uniform prior, shown with both the MLE/MAP point estimate and the average of the Bayesian predictions:

Point vs. Bayesian Estimates
Hits MLE/MAP Bayesian
0 0.168 0.180
1 0.360 0.302
2 0.309 0.275
3 0.132 0.165
4 0.028 0.064
5 0.002 0.012

The Bayesian estimate here is more dispersed (flatter, higher variance, higher entropy) than the MLE estimate. Of course, a true Bayesian estimate would provide a joint distribution over outcomes 0,1,2,3,4,5 (which reduce to a simplex, because any 5 values determine the 6th) for downstream reasoning. The nice thing about passing uncertainty along by integration is that it’s very plug-and-play.

Luckily for us baseball stats geeks, there’s some advanced reading hot off the press from Wharton Business School:

### R Source

Here’s how I generated the table.

# observed
hits = 3
atBats = 10

# predicted
nextAtBats = 5

# y: MLE predictive distro
average = hits/atBats
y = rep(0,nextAtBats+1);
for (i in 0:nextAtBats) y[i+1] = dbinom(i,nextAtBats,average)

# z: Bayesian predictive distro
numSamps = 20000
for (k in 1:numSamps) {
sampledAvg = rbeta(1,1+hits,1+atBats-hits)
for (i in 0:nextAtBats)
z[i+1] = z[i+1] + dbinom(i,nextAtBats,sampledAvg)
}
for (i in 0:nextAtBats)
z[i+1] = z[i+1] / numSamps

print("Point"); print(y, digits=3)
print("Bayes"); print(z, digits=3)


### 2 Responses to “Batting Averages: Bayesian vs. MLE Estimates”

1. staffbeethoven Says:

Hi thank you for the post. But I got error running your code: Object z is not found.

Also, why did you only want one sample, as the first input argument in the line:
sampledAvg = rbeta(1,1+hits,1+atBats-hits)