[Update: 10 Feb 2014. Much has changed in Lucene since 3.0. An extensive tutorial for Lucene 4 is now available as a chapter in the book
This chapter covers search, indexing, and how to use Lucene for simple text classification tasks. A bonus feature is a quick reference guide to Lucene’s search query syntax.]
Update (24 July 2012) The tutorial has been updated for Lucene 3.6. See:
With this release of the LingPipe Book, I created a standalone version of the tutorial for version 3 of the Apache Lucene search library.
It contains about 20 pages covering the basics of analysis, indexing and search. It’s distributed with sample code and an Ant build file with targets to run the demos.
Building the Source
The ant build file is in the file
src/applucene/build.xml and should be run from that directory. The book’s distribution is organized this way so that each chapter’s demo code is roughly standalone, but they are able to share libs. There are some minor dependencies on LingPipe in the example (jar included), but those are just for I/O and could be easily removed or replicated.
More In-Depth Info on Lucene
The standard reference for Lucene is not its own site or javadoc, which are fairly limited tutorial-wise, but rather the recently released (as of February 2011) book by three Lucene committers:
- Michael McCandless, Erik Hatcher, and Otis Gospodnetić. 2010. Lucene in Action, Second Edition. Manning Press.
Looking at the Manning Press page for the book (linked above), I just realized they blurbed one of my previous blog posts, a review of Lucene in Action!
But wait, there’s more
If you’re interested in natural language, or just need a tutorial on character encodings and Java strings and I/O, you can find the rest of the LingPipe book at its home page:
Enjoy. And as always, let me know if you have any comments, here, or directly to