<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Lucene 2.4 in 60 seconds</title>
	<atom:link href="http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Sat, 04 Feb 2012 20:56:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-16168</link>
		<dc:creator><![CDATA[Bob Carpenter]]></dc:creator>
		<pubDate>Wed, 19 Oct 2011 05:24:51 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-16168</guid>
		<description><![CDATA[I&#039;m afraid not --- I&#039;ve never gotten through the formulas.  I believe there&#039;s been discussion in the past on the Lucene mailing lists about BM25.]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m afraid not &#8212; I&#8217;ve never gotten through the formulas.  I believe there&#8217;s been discussion in the past on the Lucene mailing lists about BM25.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TrulyYours</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-16148</link>
		<dc:creator><![CDATA[TrulyYours]]></dc:creator>
		<pubDate>Mon, 17 Oct 2011 03:26:11 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-16148</guid>
		<description><![CDATA[Very good tutorial. Simple and easy to understand. Do you have tutorials about BM25 algorithm?]]></description>
		<content:encoded><![CDATA[<p>Very good tutorial. Simple and easy to understand. Do you have tutorials about BM25 algorithm?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-15170</link>
		<dc:creator><![CDATA[Bob Carpenter]]></dc:creator>
		<pubDate>Mon, 27 Jun 2011 17:47:59 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-15170</guid>
		<description><![CDATA[That&#039;s a bunch of different questions.  It may be characters instead of tokens, because a field is constructed with a string or stream of characters.  

The doc for Lucene&#039;s perhaps not the best place to look.  Go and look at the source code, and you should be able to track down what it&#039;s doing.  Or write a test case.  

Lucene also has a very responsive user&#039;s mailing list.

As for what gets skipped, that depends on the Analyzer implementation.  Lucene has analyzers for many languages in its extended distribution (beyond the core jar).  You can also define your own stop words.  Typically, punctuation also gets removed, too.]]></description>
		<content:encoded><![CDATA[<p>That&#8217;s a bunch of different questions.  It may be characters instead of tokens, because a field is constructed with a string or stream of characters.  </p>
<p>The doc for Lucene&#8217;s perhaps not the best place to look.  Go and look at the source code, and you should be able to track down what it&#8217;s doing.  Or write a test case.  </p>
<p>Lucene also has a very responsive user&#8217;s mailing list.</p>
<p>As for what gets skipped, that depends on the Analyzer implementation.  Lucene has analyzers for many languages in its extended distribution (beyond the core jar).  You can also define your own stop words.  Typically, punctuation also gets removed, too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-15168</link>
		<dc:creator><![CDATA[Paul]]></dc:creator>
		<pubDate>Mon, 27 Jun 2011 14:55:56 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-15168</guid>
		<description><![CDATA[Just a quick question since I am just checking on lucene configuration I found differing explainations of the &quot;MaxFieldLength&quot;-parameter.

&quot;The constant IndexWriter.MaxFieldLength.LIMITED defaults to 10,000 characters.&quot;
- is it realy characters or maybe words? on the lucene page it says &quot;The maximum number of terms that will be indexed for a single field in a ...&quot;. Also I wonder if it only skips english fill words (and,but, if etc. or also other languages).]]></description>
		<content:encoded><![CDATA[<p>Just a quick question since I am just checking on lucene configuration I found differing explainations of the &#8220;MaxFieldLength&#8221;-parameter.</p>
<p>&#8220;The constant IndexWriter.MaxFieldLength.LIMITED defaults to 10,000 characters.&#8221;<br />
- is it realy characters or maybe words? on the lucene page it says &#8220;The maximum number of terms that will be indexed for a single field in a &#8230;&#8221;. Also I wonder if it only skips english fill words (and,but, if etc. or also other languages).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sundus Hassan</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-12942</link>
		<dc:creator><![CDATA[Sundus Hassan]]></dc:creator>
		<pubDate>Wed, 02 Mar 2011 07:29:36 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-12942</guid>
		<description><![CDATA[Is Lucene is a Knowledge base like Wikitology?
That is Wikitology has own knowledge base and index developed on Wikipedia. 
But uptill now the examples I have gone through in that we are building index on text given by the user, not on any knowledge base. 

Please help me in this regards, to clear this concept. 

Will be looking forward for reply. 

Thanks in advance.]]></description>
		<content:encoded><![CDATA[<p>Is Lucene is a Knowledge base like Wikitology?<br />
That is Wikitology has own knowledge base and index developed on Wikipedia.<br />
But uptill now the examples I have gone through in that we are building index on text given by the user, not on any knowledge base. </p>
<p>Please help me in this regards, to clear this concept. </p>
<p>Will be looking forward for reply. </p>
<p>Thanks in advance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-6843</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Mon, 10 May 2010 21:56:12 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-6843</guid>
		<description><![CDATA[While it&#039;s nice that the new book is out, I don&#039;t think the authors or Manning Press would appreciate the pirated PDF!  So I&#039;m erasing the link.]]></description>
		<content:encoded><![CDATA[<p>While it&#8217;s nice that the new book is out, I don&#8217;t think the authors or Manning Press would appreciate the pirated PDF!  So I&#8217;m erasing the link.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pankil Patel</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-6842</link>
		<dc:creator><![CDATA[Pankil Patel]]></dc:creator>
		<pubDate>Mon, 10 May 2010 15:12:59 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-6842</guid>
		<description><![CDATA[[link to copy of new Lucene in Action Book removed].  ]]></description>
		<content:encoded><![CDATA[<p>[link to copy of new Lucene in Action Book removed].</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-4142</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Tue, 10 Mar 2009 16:44:50 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-4142</guid>
		<description><![CDATA[Everyone: 

This post and comment thread wasn&#039;t intended to be a replacement for the:

&lt;a href=&quot;http://lucene.apache.org/java/docs/mailinglists.html&quot; rel=&quot;nofollow&quot;&gt;Lucene Mailing Lists&lt;/a&gt;

They get a lot of traffic, but are good at answering questions.

Having said that, these questions are easy:

@Antonio: yes.  try it.

@Hari: yes, set the default field in the analyzer.]]></description>
		<content:encoded><![CDATA[<p>Everyone: </p>
<p>This post and comment thread wasn&#8217;t intended to be a replacement for the:</p>
<p><a href="http://lucene.apache.org/java/docs/mailinglists.html" rel="nofollow">Lucene Mailing Lists</a></p>
<p>They get a lot of traffic, but are good at answering questions.</p>
<p>Having said that, these questions are easy:</p>
<p>@Antonio: yes.  try it.</p>
<p>@Hari: yes, set the default field in the analyzer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hari</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-4141</link>
		<dc:creator><![CDATA[hari]]></dc:creator>
		<pubDate>Tue, 10 Mar 2009 13:37:56 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-4141</guid>
		<description><![CDATA[Hi,

I am working on the Lucene 2.4 for indexing the documents and searching. I have a requirement where the documents should be searched against a keyword among multiple fields (for each document). Can i search for the documents without mentioning the field names of the document? 

Thanks]]></description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I am working on the Lucene 2.4 for indexing the documents and searching. I have a requirement where the documents should be searched against a keyword among multiple fields (for each document). Can i search for the documents without mentioning the field names of the document? </p>
<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antonio</title>
		<link>http://lingpipe-blog.com/2009/02/18/lucene-24-in-60-seconds/#comment-4137</link>
		<dc:creator><![CDATA[Antonio]]></dc:creator>
		<pubDate>Tue, 10 Mar 2009 09:41:51 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=698#comment-4137</guid>
		<description><![CDATA[Hi.

I&#039;m using Lucene 2.4 to index a document with the following text:

HELLO JAMES WELCOME

[stored/uncompressed,indexed, stored/uncompressed,indexed, stored/uncompressed,indexed]


HELLO FATHER GOODBYE JAMES

[stored/uncompressed,indexed, stored/uncompressed,indexed]


HELLO FATHER WELCOME FATHER

[stored/uncompressed,indexed]


GOODBYE JAMES GOODBYE FATHER

[stored/uncompressed,indexed, stored/uncompressed,indexed, stored/uncompressed,indexed]


For each line I created a new document with the instruction doc = new Document(); I saved the text in a Lucene index doc.add(new Field(&quot;p&quot;, line, Field.Store.YES, Field.Index.NOT_ANALYZED)); the number of the line of each phrase doc.add(new Field(&quot;numLine&quot;, numLine, Field.Store.YES, Field.Index.NOT_ANALYZED));  and finally for each people&#039;s name I created a new Lucene index doc.add(new Field(&quot;name&quot;, name_person, Field.Store.YES, Field.Index.NOT_ANALYZED));

I want to make a searcher that gives every files and fields with the searching name, for instance, searching for &quot;JAMES&quot; will create a result in documents 1,2 and 4; searching for &quot;HELLO FATHER&quot; will give documents 2 and 3 and searching &quot;FATHER&quot; will result in documents 2, 4 and 3 (twice)....but...

  If I search &quot;hello&quot; the result is the number 1.
  If I search &quot;hello f*&quot; the result is the number 1.
  If I search &quot;father&quot; the result is nothing.
  If I search &quot;james&quot; the result is the number 1,2 and 4. --&gt; Is the only correct.

Right now I just know how to look for one word, for example with &quot;JAMES&quot; I use query: name: james.

How can I search for a phrase or more than one word? Will it be something similar to query: p:hello p:father ?

Bye]]></description>
		<content:encoded><![CDATA[<p>Hi.</p>
<p>I&#8217;m using Lucene 2.4 to index a document with the following text:</p>
<p>HELLO JAMES WELCOME</p>
<p>[stored/uncompressed,indexed, stored/uncompressed,indexed, stored/uncompressed,indexed]</p>
<p>HELLO FATHER GOODBYE JAMES</p>
<p>[stored/uncompressed,indexed, stored/uncompressed,indexed]</p>
<p>HELLO FATHER WELCOME FATHER</p>
<p>[stored/uncompressed,indexed]</p>
<p>GOODBYE JAMES GOODBYE FATHER</p>
<p>[stored/uncompressed,indexed, stored/uncompressed,indexed, stored/uncompressed,indexed]</p>
<p>For each line I created a new document with the instruction doc = new Document(); I saved the text in a Lucene index doc.add(new Field(&#8220;p&#8221;, line, Field.Store.YES, Field.Index.NOT_ANALYZED)); the number of the line of each phrase doc.add(new Field(&#8220;numLine&#8221;, numLine, Field.Store.YES, Field.Index.NOT_ANALYZED));  and finally for each people&#8217;s name I created a new Lucene index doc.add(new Field(&#8220;name&#8221;, name_person, Field.Store.YES, Field.Index.NOT_ANALYZED));</p>
<p>I want to make a searcher that gives every files and fields with the searching name, for instance, searching for &#8220;JAMES&#8221; will create a result in documents 1,2 and 4; searching for &#8220;HELLO FATHER&#8221; will give documents 2 and 3 and searching &#8220;FATHER&#8221; will result in documents 2, 4 and 3 (twice)&#8230;.but&#8230;</p>
<p>  If I search &#8220;hello&#8221; the result is the number 1.<br />
  If I search &#8220;hello f*&#8221; the result is the number 1.<br />
  If I search &#8220;father&#8221; the result is nothing.<br />
  If I search &#8220;james&#8221; the result is the number 1,2 and 4. &#8211;&gt; Is the only correct.</p>
<p>Right now I just know how to look for one word, for example with &#8220;JAMES&#8221; I use query: name: james.</p>
<p>How can I search for a phrase or more than one word? Will it be something similar to query: p:hello p:father ?</p>
<p>Bye</p>
]]></content:encoded>
	</item>
</channel>
</rss>

