LingPipe 3.0 Released

by

We’re happy to announce the release of LingPipe 3.0. As usual, you may view it or download it from the LingPipe Home Page.

Major Release

The latest release of LingPipe is LingPipe 3.0.0. This major release replaces LingPipe 2.4.1. Although 3.0 maintains all of the functionality of 2.4, it is not 100% backward compatible. All demos and tutorials have been updated to the new API.

Generics

The major change that will be evident is that the API has been modified to use generic types. This has allowed us to remove many redundant classes and generalize other common behavior. Many of the classes now implement java.lang.Iterable, allowing them to be used in the Java 1.5 foreach construct. Most of the type-specific utility methods have been replaced with generic alternatives.

Keep in mind that like in Java 1.5 itself, you don’t need to use the generics. You may continue to write non-generic code against our API in the same way as for the Java 1.5 collections framework.

Clustering

The com.aliasi.cluster package was completely rewritten from the ground up. Now, rather than clustering the rows of a labeled matrix, a clusterer clusters a set of objects under a distance measure.

The dendrogram classes for hierarchical clustering results have been cleaned of their indexing behavior, which was only necessary for the previous implementations.

For the new API, there’s a completely new clustering tutorial, which among other things, uses linguistic examples such as clustering documents by topic or entity mentioned. We’ve included Bagga and Baldwin’s John Smith data (197 New York Times Articles annotated for which of 35 different John Smiths is mentioned; it’s available as the tarball demos/data/johnSmith.tar.gz.

LingPipe in Eclipse

We added a tutorial on Lingpipe in Eclipse, which explains how to get started building LingPipe in the Eclipse Integrated Development Environment (IDE).

Distance and Proximity

Two generic classes were added to the utility package, Distance<E>, and Proximity<E>. These are not only used in clustering, but also in the distance functions in com.aliasi.spell package.

Matrices and Vectors

The com.aliasi.matrix package was simplified to remove the complexities of labeling. In the future, we plan to build this package out with sparse and memory-mapped matrices.

Iterators

Iterators that were formerly in util.Arrays, namely ArrayIterator and ArraySliceIterator may now be found in the unified Iterators utility class. A new Iterators.Empty class was added in order to support genericity; it replaces the overloaded constant. Finally, util.SequenceIterator was made rolled into util.Iterators along with the others, util.Iterators.Sequence.

MEDLINE Parsing Standardized

The medline.MedlineParser class was modified to implement corpus.Parser<MedlineHandler>. At the same time, the class medline.MedlineHandler was modified to implement corpus.Handler. The unusued corpus.MedlineCitationHandler interface was removed.

ObjectToCounter Simplified

The util.ObjectToCounter interface was removed; we only ever used the util.ObjectToCounterMap implementation, a generic version of which remains.

Unused Classes Removed

In the code review for generics, we found unused classes in the com.aliasi.coref package, Entity and EntityFactory. The class util.SmallArray was removed. The interface util.StringDistance was removed; it is replaced with the generic util.Distance interface. Finally, the util.Visitor interface was removed; the corpus.Handler interface is doing its job.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s