Retraction: Only 1% Precision at 99.9% Recall for BioCreative Gene Chunks


Due to the bug we fixed in our precision-recall curve evaluations in the latest version (3.8.2) of LingPipe, I have to retract the results reported in:

  • Carpenter, Bob. 2007. LingPipe for 99.99% Recall of Gene Mentions. In Proceedings of the 2nd Biocreative Workshop. Valencia, Spain.

and unfortunately, the paragraph I contributed to:

My very first MEDLINE entry, and it’s buggy.

Original (Erroneous) Results

In those papers, I reported 7% precision at 99.99% recall, 8% precision at 99.9% recall, and 11% precision at 99% recall.

Corrected Results

It turns out the real numbers, using default LingPipe settings (n-gram = 5, interpolation = 5) with a max of 1024 chunks/sentence in our chunk.CharLmHmmChunker, the actual results are:

Recall Precision
99% 3.6%
99.9% 0.9%
99.99% 0.6%
100% 0.5%

Reducing n-gram length to 4 raises precision at 99.9% recall to 1% and 3-grams raise precision at 99.9% recall to 1.3%. As I speculated in the paper, less tightly fit models do a bit better in high recall settings, even though they do worse in high-F-measure evaluations.

Luckily it still only takes 2 minutes to do a complete confidence-based 20-fold cross-validation in a single thread.

The code for the evaluation’s all checked into the

