Intermediate Release
The latest release of LingPipe is LingPipe 3.5.0. This release replaces LingPipe 3.4.0, with which it is backward compatible other than for the matrix.Vector
interface (details below).
Logistic Regression (aka Max Entropy)
The main addition in this release is of multinomial logistic regression (often called "maximum entropy classification" in the computational linguistics literature). Logistic regression produces a probabilistic discriminitive classifier with state-of-the-art accuracy. The regression estimators use stochastic gradient descent for scalability to large feature spaces with sparse inputs.
There is a direct matrix-based implementation in
and an adapter based on general feature extraction
in:
There are two support classes introduced for logistic regression, one for Laplace, Gaussian and Cauchy priors (also known as "regularizers"):
and one for annealing schedules to control the gradient descent:
There is a new tutorial describing how to use these classes referenced below.
Cross-Validating Classification Corpus
To support cross-validation evaluations for classifiers, there is a new corpus implementation:
There is a new tutorial section describing how to use this class referenced below.
LineParser and SVMlight Classification Parser
There’s an implementation of a parser for the SVMlight file-based classifier format:
This parser is based on a new abstract line-based parser implementation in:
Pair Utility Class
There is a new class for pairs of heterogeneous types introduced primarily as a utility for methods that return pairs of results:
Additional Vector Methods
The interface matrix.Vector
has been updated with two new methods which allow an efficient inspection of non-zero dimensions. This was done to allow the interface vector to be used directly in logistic regression.
Vector and matrix client code remains unaffected. A conflict will arise only with implementations of vector outside of LingPipe.
Logistic Regression Tutorial
There’s a new classification tutorial which covers
the new logistic regression classes in the stats
and classify
packages:
Cross-Validation Tutorial Section
There’s a new section in the topic classification tutorial covering cross-validation of classifiers and the corpus.Corpus
class.
Leave a Reply