EM’s awfully prone to local maxima in complex problems. Once there’s a reasonable amount of training data, I haven’t seen much in the way of improvement from EM. I’m thinking I’ll try Gibbs sampling instead of EM next; it’s even easier for classifiers than EM, and less prone to get stuck in local optima.

]]>In my experience, scaling up the number of annotators tends to be more useful than running EM. I could be doing something wrong though.

]]>Figuring out true paper “quality” given referee scores was one of the applications I’ve been thinking about. I originally thought a simple bootstrap analysis would be interesting, and it would be simple. But now I’ve been thinking more along the lines of linear modeling, perhaps with logistic linking to get the boundedness of the scale.

]]>Thanks for the reference. This is just the kind of thing I was looking for. And looking at the papers that cite it has opened up a vein of this kind of literature.

I’m working on a very similar approach using a Gibbs sampler, which has the nice property of giving me posterior uncertainty estimates of things like confusion matrices.

In searching for your reference, I found this, which is where I was planning to go with the posterior category samples from the fitted models:

Learning with Multiple Labels. Rong Jin and Zoubin Ghahramani.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.8894

I’ll post the details next week when I’m done writing up the paper.

]]>We first assume known annotator accuracy (initialized, say, to perfect accuracy), and based on that we compute the most likely labels.

Then, based on the estimated labels, we can estimate the labelers accuracy.

By iterating, we can converge to some good estimates of the labels, and generate a confusion matrix for each annotator.

Take a look at

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A. P. Dawid and A. M. Skene

Applied Statistics, Vol. 28, No. 1 (1979), pp. 20-28

http://www.jstor.org/pss/2346806