LingPipe 3.5.0 Released


Intermediate Release

The latest release of LingPipe is LingPipe 3.5.0. This release replaces LingPipe 3.4.0, with which it is backward compatible other than for the matrix.Vector interface (details below).

Logistic Regression (aka Max Entropy)

The main addition in this release is of multinomial logistic regression (often called "maximum entropy classification" in the computational linguistics literature). Logistic regression produces a probabilistic discriminitive classifier with state-of-the-art accuracy. The regression estimators use stochastic gradient descent for scalability to large feature spaces with sparse inputs.

There is a direct matrix-based implementation in

and an adapter based on general feature extraction

There are two support classes introduced for logistic regression, one for Laplace, Gaussian and Cauchy priors (also known as "regularizers"):

and one for annealing schedules to control the gradient descent:

There is a new tutorial describing how to use these classes referenced below.

Cross-Validating Classification Corpus

To support cross-validation evaluations for classifiers, there is a new corpus implementation:

There is a new tutorial section describing how to use this class referenced below.

LineParser and SVMlight Classification Parser

There’s an implementation of a parser for the SVMlight file-based classifier format:

This parser is based on a new abstract line-based parser implementation in:

Pair Utility Class

There is a new class for pairs of heterogeneous types introduced primarily as a utility for methods that return pairs of results:

Additional Vector Methods

The interface matrix.Vector has been updated with two new methods which allow an efficient inspection of non-zero dimensions. This was done to allow the interface vector to be used directly in logistic regression.

Vector and matrix client code remains unaffected. A conflict will arise only with implementations of vector outside of LingPipe.

Logistic Regression Tutorial

There’s a new classification tutorial which covers
the new logistic regression classes in the stats
and classify packages:

Cross-Validation Tutorial Section

There’s a new section in the topic classification tutorial covering cross-validation of classifiers and the corpus.Corpus class.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: