link: Latent Trait and item-Response Model Bibliography

and also a top-level overview of inter-coder agreement:

link: Statistical Methods for Evaluating Interannotator Agreement

I’ve caught up on more of the literature since this post. I started thinking about item-response models initially. Part of the problem is the variety of terminology used for the same concepts.

I actually just finished my second pass through this:

Uebersax JS, Grove WM. 1993. A latent trait finite mixture model for the analysis of rating agreement. *Biometrics*.

The Uebersax and Grove (1993) paper not only introduces the latent trait model (coder traits are rating threshold and noisiness), but has a really nice description of the Gaussian mixture underlying the model and the resulting ordinal logistic/probit regression model (ordinal models allow ratings on a scale, such as 1-5 movie ratings).

The model from the Qu, Tan and Kutner (1996) *Biometrics* paper splits the predictors in two based on the latent class (inferred true category), using one set for positive cases (sensitivity) and one for negative cases (specificity). These are derived properties in the Uebersax and Grove approach.

You’re right, though, this is basically a logical idea which has been re-discovered independently in several disciplines (the problem this shows, however, is that few researchers these days know how to do a decent literature search).

In any case, in a series of articles, Gelfand and Solomon (JASA 1973/74/75) show that latent class models originated with Poisson who used them to estimate the accuracy of jury decisions.

]]>It’s really interesting that so many fields have reinvented aspects of these techniques.

Brendan

]]>