There’s a book about LingPipe!
- Konchady, Manu. 2008. Building Search Applications: Lucene, LingPipe, and Gate. Mustru Publishing.
The title is linked to the Amazon page; it’s also available as an inexpensive download from Lulu.
The Bottom Line
The subtitle “A practical guide to building search applications using open source software” pretty much sums it up (comment added June 22, 2008: please see Seth Grimes’s comment below about LingPipe’s royalty-free license not being compatible with other open-source licenses). It takes a reader that knows Java, but nothing at all about search or associated text processing algorithms, and provides a hands-on, step-by-step guide for building a state-of-the-art search engine.
I (Bob) gave Manu feedback on a draft, but there wasn’t much to correct on the LingPipe side, so I can vouch for the book’s technical accuracy. (Disclaimer: I didn’t actually try to run the code.)
Chapter by Chapter Overview
After (1) a brief discussion of application issues, the chapters include (2) tokenization in all three frameworks, (3) indexing with Lucene, (4) searching with Lucene, (5) sentence extraction, part-of-speech tagging, interesting/significant phrase extraction, and entity extraction with LingPipe and Gate (6) clustering with LingPipe, (7) topic and language classification with LingPipe, (8 ) enterprise and web search, page rank/authority calculation, and crawling with Nutch, (9) tracking news, sentiment analysis with LingPipe, detecting offensive content and plagiarism, and finally, (10) future directions including vertical search, tag-based search and question-answering.
For those wanting introductions to the LingPipe APIs mentioned above, Konchady’s book is a gentler starting point than our own tutorials.
That may sound like a whole lot of ground to cover in 400 pages, but Konchady pulls the reader along by illustrating everything with working code and not getting bogged down in theoretical boundary conditions. There are pointers to theory, and a bit of math where necessary, but the book never loses sight of its goal of providing a practical introduction. In that way, it’s like the Manning in Action series.
The book’s hot of the presses, so it’s up to date with Lucene 2.3 and LingPipe 3.3.
About the Author
Manu Konchady‘s an old hand at search and text processing. You may remember him from such books as Text Mining Application Programming and High Performance Parallel Algorithms for Scientific Computing with Application to a Coupled Ocean Model.