The latest release of LingPipe is LingPipe 3.2.0. This release replaces LingPipe 3.1.2, with which it is backward compatible with one exception (see next section).

#### Backward Incompatibility

The p-value methods in `stats.BinomialDistribution`

and `stats.MultinomialDistribution`

have been removed. The javadoc for the classes includes the code for the removed implementations, which were based on the Jakarta Commons Math library.

#### Zero External Dependencies

The reason we removed p-values is that was the last remaining piece of functionality in LingPipe that required external dependencies. As of this release, there is no longer a dependency on the Jakarta Commons Math library or the Apache Xerces XML libraries. The latter functionality has been folded into Java itself.

#### New Features

*Singular Value Decomposition*

We’ve included an implementation of singular value decomposition. It uses stochastic gradient descent, so is scalable to large matrices and allows partial matrix input (matrices with some unknown values). The implementation is encapsulated in a single class, `matrix.SvdMatrix`

, the javadoc of which explains the mathematics of SVD and our implementation.

*SGML Normalizer*

We were tired of writing one-off SGML entity normalizers, so we imported a class representing the entire standard, `util.Sgml`

.

*Line Tokenizer Factory*

To support our work on document parsing (text extraction, bibliography extraction, e-mail signature extraction, etc.), we’ve included a new tokenizer, `tokenizer.LineTokenizerFactory`

, that produces tokens based on lines of text. It is used in the sandbox project citationEntities

*Soundex Tokenizer Filter*

We’ve provided an implementation of Soundex as a tokenizer filter in `tokenizer.SoundexFilterTokenizer`

. Soundex reduces strings of words to a simplified, (English) pronunciation-based representation. We were mainly motivated by exploring features in our feature-based classifiers. The javadoc contains a complete description of the Soundex algorithm.

*Distances and Proximities*

We made sure that if one of the interfaces `util.Distance`

or `util.Proximity`

was implemented by an object, so was the other. This allows them all to be plugged into distance-based clusterers and classifiers (e.g. k nearest neighbors).

#### Tutorials

*Singular Value Decomposition*

The SVD Tutorial that walks through the applications to latent semantic indexing and basic n-gram smoothing. It also covers all the tuning parameters (learning rate and annealing, initial values, early stopping, regularization, etc.)

*Word Sense Disambiguation Tutorial*

The Word Sense Tutorial provides details on creating a complete Senseval word sense disambiguation system using contextual classification. Word sense disambiguation is the problem of determining which dictionary entry (for a given dictionary) corresponds to a given token of a word in a text.