[Update: 10 Feb 2014. An extensive tutorial for Lucene 4 is now available as a chapter in the book
This chapter covers search, indexing, and how to use Lucene for simple text classification tasks. A bonus feature is a quick reference guide to Lucene's search query syntax.]
The current Apache Lucene Java version is 3.6, released in April of 2012. We’ve updated the Lucene 3 tutorial and the accompanying source code to bring it in line with the current API so that it doesn’t use any deprecated methods and my, there are a lot of them. Bob blogged about this tutorial back in February 2011, shortly after Lucene Java rolled over to version 3.0.
Like other 3.x minor releases, Lucene 3.6 introduces performance enhancements, bug fixes, new analyzers, and changes that bring the Lucene API in line with Solr. In addition, Lucene 3.6 anticipates Lucene 4, billed as “the next major backwards-incompatible release.”
Significant changes since version 3.0
IndexReaderdelete methods are deprecated and will be removed entirely in Lucene 4. All deletes and updates are done via an
- There is a single
IndexWriterconstructor that takes two arguments: the index directory and an
IndexWriterConfigobject. The latter was introduced in Lucene 3.1. It holds configuration information that was previously specified directly as additional arguments to the constructor.
IndexWriteroptimize methods are deprecated. The merge method(s) supply this functionality.
Building the Source
The ant build file is in the file
src/applucene/build.xml and should be run from that directory. The book’s distribution is organized this way so that each chapter’s demo code is roughly standalone, but they are able to share libs. There are some minor dependencies on LingPipe in the example (jar included), but those are just for I/O and could be easily removed or replicated. As an added bonus, the source code now includes the data used in the examples throughout the tutorial, the venerable Federalist Papers from Project Gutenberg.