I’ll use the data in the example in the Wikipedia article on contingency tables:
|Male||9 (y1)||43||52 (n1)|
|Female||4 (y2)||44||48 (n2)|
The statistical hypothesis we wish to investigate is whether the proportion of left-handed men is greater than, equal to, or less than the proportion of left-handed women.
The observed data has left-handed men of a total of men and left-handed women of a total of women.
Letting be the proportion of left-handed men and the proportion of left-handed women, Andrew suggested modeling the number of left-handers as a binomial,
He then suggested simple uniform priors on the proportions, which we know are equivalent to priors:
Given the conjugacy of the beta for the binomial, we can analytically compute the posteriors:
We could try to compute the posterior density of analytically by solving the integral
and then evaluating .
Instead, we’ll just follow Andrew’s sage advice and estimate the integral by simulation. That is, we’ll take samples from the joint posterior
which we write as
and then use the Monte Carlo approximation,
where is the indicator function, taking on value 1 if its property is true and 0 otherwise.
In words, we estimate the probability that is greater than by the fraction of samples where .
I love R. Here’s the R code to compute estimated posteriors by simulation and visualize them.
# DATA n1 = 52 # men y1 = 9 # left-handed men n2 = 48 # women y2 = 4 # left-handed women # SIMULATION I = 10000 # simulations theta1 = rbeta(I, y1+1, (n1-y1)+1) theta2 = rbeta(I, y2+1, (n2-y2)+1) diff = theta1-theta2 # simulated diffs # OUTPUT quantiles = quantile(diff,c(0.005,0.025,0.5,0.975,0.995)) print(quantiles,digits=2) print("Probability higher % men than women lefties:") print(mean(theta1>theta2)) # VISUALIZATION #png(file="bayesContingency.png") plot(density(diff), xlab="theta1 - theta2", ylab="p(theta1 - theta2 | y, n)", main="Posterior Simulation of Male - Female Lefties", ylim=c(0,8), frame.plot=FALSE,cex.lab=1.5,lwd=3,yaxt="no") abline(v=quantiles, col="blue") abline(v=quantiles, col="blue") #dev.off()
Running this code prints out:
> source("twoByTwoPosterior.R") 0.5% 2.5% 50% 97.5% 99.5% -0.091 -0.046 0.084 0.218 0.262  "Probability higher % men than women lefties:"  0.901  "Mean difference theta1-theta2"  0.08484979
The first numbers displayed are the posterior quantiles for the difference. The posterior median (50% quantile) difference is 0.084. The other numbers are quantiles which determine 95% and 99% posterior intervals, which span (0.025,0.975) and (0.005,0.995) respectively. The posterior central 95% quantile is thus (-0.046,0.218). Quantiles get the obvious interpretation in Bayesian stats — given the model and data, we assign a 95% probability to the true difference lying in the interval (-.046,0.218).
The second number printed is the posterior probability that men have a higher percentage of left-handness than women, which is 0.901, or about 90%. R makes this simple by letting us write
mean(theta1 > theta2). This is also the value you’d get from integrating the posterior density from to .
The final number is the mean difference between theta1 and theta2, which is the unbiased Bayesian estimator of the difference. This value is 0.0848, which is a bit higher than the median difference of 0.084.
The code also generates the following graph of the posterior:
The vertical lines are drawn at the boundaries of the central 95% region of the posterior difference .
Don’t ask. You’re a Bayesian now.
Instead, say that according to your model, based on the observed data, there’s a 90% chance there’s a higher prevalence of lefthandedness among men than women, or that there’s a 95% chance the difference between male prevalence of lefthandness and female prevalence of lefthandedness falls in the range (-.05,0.22). [Remember not to use too many digits. The outputs from R were so you could see the small differences.]