LingPipe 2.4.0 Released


I just put LingPipe 2.4.0 up on our web servers. It’s a fairly minor change in terms of code, but the upgrades are not 100% backward compatible.

I ran tests with Sun JDKs 1.4, 1.5, and the now-official JDK 1.6. We’re sticking with 1.4 jar builds until Sun declares the end of 1.4 support.

Here’s the list of changes from the new home page:

MEDLINE 2007: The major new feature is support for the new 2007 MEDLINE document type definitions (DTDs). The tutorials have also been updated to reflect the new format.

Sequence Training for Token Language Models: The token language models and underlying tries have been upgraded to support sequence training with explicit counts. This means the new LDC Google disk can be used to train a token language model. Character language models already supported this feature. The underlying dynamic language model interface lm.LanguageModel.Dynamic was also changed to support sequence-level training with counts.

New Class spell.TfIdfDistance: This matches strings based on token frequency and inverse document frequency (TF/IDF). Any tokenizer may be used, including character n-grams.

New Class spell.JaroWinklerDistance: This matches strings based on the Jaro-Winkler distance used to deduplicate name data at the U.S. Census Bureau.

One Response to “LingPipe 2.4.0 Released”

  1. Taras Says:

    i’m sorry if my question is offtop for this post

    But could you please add Reference part to NER tutorial at Because there is one on text-classification page and clustering page and no for NER. I’m interested in methods that are used in LingPipe for this task not in general articles about entity recognition.
    Is there any list of publications about LingPipe core engine?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: