Massimo and I are giving our tutorial this afternoon in Malta at LREC 2010. Here’s a link to my slides:

- Carpenter, Bob and Massimo Poesio. 2010. Models of Data Annotation. Part II of our tutorial at
*LREC*.

Thanks to Massimo for lots of great suggestions and feedback from when I gave the talk in Rovereto Italy last week (U. Trento), when we talked about it at length between then and now, and on the first few drafts of the slides.

And here’s a link to Massimo’s slides:

- Poesio, Massimo and Bob Carpenter (and Ron Artstein). 2010. Statistical Models of the Annotation Process. Part I of our tutorial at
*LREC*.

Massimo’s part was based on his and Ron’s *CL Journal* review of kappa-like stats, and the tutorials they’ve given in the past. As such, they’re much more mature. The examples are really tight and to the point.

May 17, 2010 at 7:25 pm |

I’m really pleased to see this material getting out there — good work!

How long until we have a Bayesian replacement for evalb? (smile)

M

May 18, 2010 at 6:00 pm |

Thanks for the vote of confidence.

As Andrew Gelman’s always suggesting, it makes sense to do multiple comparisons in a hierarchical model to evaluate bakeoffs.

At the very least, I’d like to see the bootstrap used for variance estimates rather than making all the erroneous independence assumptions you get with other tests.

What Massimo and I are worrying about now is how to take his Phrase Detectives coref data and create gold standards. It’s easy to move from binomial to multinomial or ordinal or scalar, but so far we haven’t figured out how to do it with coreference chains. The combinatorics of the sets are rather daunting.

May 27, 2010 at 4:52 am |

I found the tutorial very useful, thank you.

I’ve been trying to find suitable paper(s) that explain this to present at our group’s reading club – you give some references in the slides (e.g. Bruce and Wiebe 1999) but you don’t specify exactly which paper, which is making it hard to track them down. Is there any chance you could give a full reference list? Is there a single paper that explains the sensitivity/specificity concept and why it’s a better option than majority voting?