<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: To Stem or Not to Stem?</title>
	<atom:link href="http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Sat, 04 Feb 2012 20:56:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/#comment-5731</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Mon, 26 Oct 2009 19:55:26 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/2007/03/21/to-stem-or-not-to-stem/#comment-5731</guid>
		<description><![CDATA[Indeed -- I love Church&#039;s paper.  It&#039;s nice to see someone try to measure the naivete of naive Bayes.  And I think correlation is a good measure; it&#039;s one that&#039;s picked up by things like SVD approaches.

You may be right about the downside.  I&#039;m getting very frustrated by what seems like overstemming on Google&#039;s part (I use an acronym that means one thing, they expand it into something else or vice-versa).  

The other thing your comment made me realize is that there&#039;s no way to recover from overstemming.  You can recover from understemming by adding in disjunctions or prefix queries.]]></description>
		<content:encoded><![CDATA[<p>Indeed &#8212; I love Church&#8217;s paper.  It&#8217;s nice to see someone try to measure the naivete of naive Bayes.  And I think correlation is a good measure; it&#8217;s one that&#8217;s picked up by things like SVD approaches.</p>
<p>You may be right about the downside.  I&#8217;m getting very frustrated by what seems like overstemming on Google&#8217;s part (I use an acronym that means one thing, they expand it into something else or vice-versa).  </p>
<p>The other thing your comment made me realize is that there&#8217;s no way to recover from overstemming.  You can recover from understemming by adding in disjunctions or prefix queries.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Schultz</title>
		<link>http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/#comment-5715</link>
		<dc:creator><![CDATA[Mike Schultz]]></dc:creator>
		<pubDate>Fri, 23 Oct 2009 23:53:38 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/2007/03/21/to-stem-or-not-to-stem/#comment-5715</guid>
		<description><![CDATA[There&#039;s a problem comparing search performance of stem no-stem systems to each other using aggregate, i.e. average measures.  Those results tend to say, oh, it&#039;s a wash, for some queries, stemming is better, for some no stemming is better.  So let&#039;s stem.  The problem is, users perceptions weigh larger in the calculations, than an average measure can capture. The downside of over-stemming is never made up by the upside.  Once a user sees &quot;Amenities&quot; -&gt; &quot;Amen&quot;, &quot;Heine&quot; -&gt; &quot;Hein&quot;, &quot;Amy&quot; -&gt; &quot;Ami&quot;, &quot;productivity&quot; -&gt; &quot;produce&quot; or the other bazillion wretched side effects of the porter stemmer, you have lost their trust.  I think the prudent dividing line is the inflection/derivational distinction.  People expect plural conflation to singlular, but Noun-&gt;Adjective-&gt;Verb-&gt;Noun?  Please, that was a bad idea 30 years ago when it was born.

A great article on the efficacy of stemming is &quot;One Term or Two?&quot; from Ken Church.  The way to motivate stemming is by being convinced that two terms should be represented by one in the first place.  If not, then not.]]></description>
		<content:encoded><![CDATA[<p>There&#8217;s a problem comparing search performance of stem no-stem systems to each other using aggregate, i.e. average measures.  Those results tend to say, oh, it&#8217;s a wash, for some queries, stemming is better, for some no stemming is better.  So let&#8217;s stem.  The problem is, users perceptions weigh larger in the calculations, than an average measure can capture. The downside of over-stemming is never made up by the upside.  Once a user sees &#8220;Amenities&#8221; -&gt; &#8220;Amen&#8221;, &#8220;Heine&#8221; -&gt; &#8220;Hein&#8221;, &#8220;Amy&#8221; -&gt; &#8220;Ami&#8221;, &#8220;productivity&#8221; -&gt; &#8220;produce&#8221; or the other bazillion wretched side effects of the porter stemmer, you have lost their trust.  I think the prudent dividing line is the inflection/derivational distinction.  People expect plural conflation to singlular, but Noun-&gt;Adjective-&gt;Verb-&gt;Noun?  Please, that was a bad idea 30 years ago when it was born.</p>
<p>A great article on the efficacy of stemming is &#8220;One Term or Two?&#8221; from Ken Church.  The way to motivate stemming is by being convinced that two terms should be represented by one in the first place.  If not, then not.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

