We’re happy to announce the release of LingPipe 3.0. As usual, you may view it or download it from the LingPipe Home Page.
Major Release
The latest release of LingPipe is LingPipe 3.0.0. This major release replaces LingPipe 2.4.1. Although 3.0 maintains all of the functionality of 2.4, it is not 100% backward compatible. All demos and tutorials have been updated to the new API.
Generics
The major change that will be evident is that the API has been modified to use generic types. This has allowed us to remove many redundant classes and generalize other common behavior. Many of the classes now implement java.lang.Iterable
, allowing them to be used in the Java 1.5 foreach construct. Most of the type-specific utility methods have been replaced with generic alternatives.
Keep in mind that like in Java 1.5 itself, you don’t need to use the generics. You may continue to write non-generic code against our API in the same way as for the Java 1.5 collections framework.
Clustering
The com.aliasi.cluster
package was completely rewritten from the ground up. Now, rather than clustering the rows of a labeled matrix, a clusterer clusters a set of objects under a distance measure.
The dendrogram classes for hierarchical clustering results have been cleaned of their indexing behavior, which was only necessary for the previous implementations.
For the new API, there’s a completely new clustering tutorial, which among other things, uses linguistic examples such as clustering documents by topic or entity mentioned. We’ve included Bagga and Baldwin’s John Smith data (197 New York Times Articles annotated for which of 35 different John Smiths is mentioned; it’s available as the tarball demos/data/johnSmith.tar.gz
.
LingPipe in Eclipse
We added a tutorial on Lingpipe in Eclipse, which explains how to get started building LingPipe in the Eclipse Integrated Development Environment (IDE).
Distance and Proximity
Two generic classes were added to the utility package, Distance<E>
, and Proximity<E>
. These are not only used in clustering, but also in the distance functions in com.aliasi.spell
package.
Matrices and Vectors
The com.aliasi.matrix
package was simplified to remove the complexities of labeling. In the future, we plan to build this package out with sparse and memory-mapped matrices.
Iterators
Iterators that were formerly in util.Arrays
, namely ArrayIterator
and ArraySliceIterator
may now be found in the unified Iterators
utility class. A new Iterators.Empty
class was added in order to support genericity; it replaces the overloaded constant. Finally, util.SequenceIterator
was made rolled into util.Iterators
along with the others, util.Iterators.Sequence
.
MEDLINE Parsing Standardized
The medline.MedlineParser
class was modified to implement corpus.Parser<MedlineHandler>. At the same time, the class medline.MedlineHandler
was modified to implement corpus.Handler
. The unusued corpus.MedlineCitationHandler
interface was removed.
ObjectToCounter Simplified
The util.ObjectToCounter
interface was removed; we only ever used the util.ObjectToCounterMap
implementation, a generic version of which remains.
Unused Classes Removed
In the code review for generics, we found unused classes in the com.aliasi.coref
package, Entity
and EntityFactory
. The class util.SmallArray
was removed. The interface util.StringDistance
was removed; it is replaced with the generic util.Distance
interface. Finally, the util.Visitor
interface was removed; the corpus.Handler
interface is doing its job.
Leave a Reply