From the Emailbox
I have a question about using the chunking evaluation class for inter annotation agreement : how can you use it when the annotators might have missing chunks I.e., if one of the files contains more chunks than the other.
The answer’s not immediately obvious because the usual application of interannotator agreement statistics is to classification tasks (including things like part-of-speech tagging) that have a fixed number of items being annotated.
The chunker evaluations built into LingPipe calculate the usual of precision and recall measures (see below). These evaluations compare a set of response chunkings to a set of reference chunkings. Usually the reference is drawn from a gold-standard corpus and the response from an automated system built to do chunking.
Precision (aka positive predictive accuracy) measures the proportion of chunks in the response that are also in the reference. Recall (aka sensitivity) measures the proportion of chunks in the reference that are in the response. If we swap the reference and response chunkings, we swap precision and recall.
True negatives aren’t really being counted here — theoretically there are a huge number of them — any possible span with any possible tag could have been labeled. LingPipe just sets the true negative count to zero, and as a result, specificity (TN/[TN+FP]) doesn’t make sense.
Suppose you have chunkings from two human annotators. Just treat one as the reference and one as the response and run a chunking evaluation. The precision and recall values will tell you which annotator returned more chunkings. For instance, if precision is .95 and recall .75, you know that the annotator assigned as the reference chunking had a whole bunch of chunks the other annotator didn’t think were chunks, but most of the chunks found by the response annotator were also chunks of the reference annotator.
You can use F-measure as an overall single-number score.
The base metrics are all explained in
and their application to chunking in
Examples of running chunker evaluations can be found in
- LingPipe: Named Entity Tutorial
LingPipe Annotation Tool
If you’re annotating entity data, you might be interested in our learn-a-little, tag-a-little tool.
- LingPipe Sandbox Project: citationEntities
Now that Mitzi’s brought it up to compatibility with LingPipe 4, we should move citationEntities out of the sandbox and into a tutorial.