LingPipe 3.6.0 Released

by

As usual, everything’s available for download at:

Intermediate Release

The latest release of LingPipe is LingPipe 3.6.0. This release replaces LingPipe 3.5.1, with which it is backward compatible other than for some unused methods being removed from the MEDLINE package. In addition to minor API additions in the forms of utility methods, and some clarifications in the Javadoc for various methods and classes, the following substantial changes were made:

Hyphenation and Syllabification Tutorial

There’s a new hyphenation and syllabification tutorial, with evaluations on a range of publicly available and for-a-fee datasets.

Spell Checking Loader Improvements

We increased the speed and reduced the memory requirements for loading a compiled model into the spell checker. Now it shouldn’t take any overhead beyond the size of the compiled model, and loading should be about twice as fast.

Length-Based Tokenizer Filter

We added a class tokenizer.LengthStopFilterTokenizer, which filters tokens out of a tokenizer that are longer than a specified maximum length.

MEDLINE Package Update

The com.aliasi.medline package was updated to reflect the fact that there are no book entries in MEDLINE. (There are Book elements in NLM DTDs used for MEDLINE, but there are no books in the data itself.)

The changes were removing the medline.Book class, the methods inBook(), inJournal(), and book(), and in() from the class medline.Article, and the element constant BOOK_ELT from medline.MedlineCitationSet.

The MEDLINE tutorials were also updated to remove all the branching logic for books.

MEDLINE Tutorial Update

We’ve updated the MEDLINE tutorial with much better downloaders and cleaned up the broken links and erroneous property specifications in the read-me so that they match the build file.

LingMed Sandbox Project

We’ve included a link from the LingPipe Sandbox to the code we use here for our back-end updating, storage, and indexing of bio-medical resources such as MEDLINE, Entrez-Gene, OMIM and GO. It contains extensive documentation and build files, but has lots of moving parts ranging from MySQL to RMI to Log4J.

The project includes a robust downloader to keep MEDLINE up to date, as well as index construction that may be used remotely through Lucene’s RMI integration. There’s a generic abstraction layer that supports object-relation mapping and querying through MySQL and object-document mapping and search through Lucene.

The LingMed sandbox project also includes a basic version of our gene linkage application, which links mentions of genes and proteins to Entrez-Gene using name matching and context matching.

Citations Web Page

We added a new page for citations of LingPipe, including papers, books, classes, and patents.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s