<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Feature Hash Code Collisions in Linear Classifiers</title>
	<atom:link href="http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Yoav Goldberg</title>
		<link>http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/#comment-2225</link>
		<dc:creator><![CDATA[Yoav Goldberg]]></dc:creator>
		<pubDate>Tue, 06 May 2008 17:51:49 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=95#comment-2225</guid>
		<description><![CDATA[My intuition is that the small set of core predictive features is quite stable, and that a very good approximation of it can be obtained by training on a relatively small sample of the data (in the setting described by Kuzman, where the interest is mostly in test time, these features can be obtained from all the data).  Once this feature set is available, a very efficient recognizer can be compiled.  So the overhead seem pretty small. 

Ofcourse, the actual benefit from this needs to be verified empirically.]]></description>
		<content:encoded><![CDATA[<p>My intuition is that the small set of core predictive features is quite stable, and that a very good approximation of it can be obtained by training on a relatively small sample of the data (in the setting described by Kuzman, where the interest is mostly in test time, these features can be obtained from all the data).  Once this feature set is available, a very efficient recognizer can be compiled.  So the overhead seem pretty small. </p>
<p>Ofcourse, the actual benefit from this needs to be verified empirically.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/#comment-2224</link>
		<dc:creator><![CDATA[Bob Carpenter]]></dc:creator>
		<pubDate>Tue, 06 May 2008 16:19:37 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=95#comment-2224</guid>
		<description><![CDATA[You&#039;d need to inspect each feature to see if it&#039;s common.   This reintroduces something like a table lookup for every feature to see if it&#039;s a common feature.  

In addition to being common, features need to be selected for discriminative power (either by looking at fitted coefficients in a table-based model or by using some kind of filtering approach like information gain or chi-squared tests).]]></description>
		<content:encoded><![CDATA[<p>You&#8217;d need to inspect each feature to see if it&#8217;s common.   This reintroduces something like a table lookup for every feature to see if it&#8217;s a common feature.  </p>
<p>In addition to being common, features need to be selected for discriminative power (either by looking at fitted coefficients in a table-based model or by using some kind of filtering approach like information gain or chi-squared tests).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yoav Goldberg</title>
		<link>http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/#comment-2180</link>
		<dc:creator><![CDATA[Yoav Goldberg]]></dc:creator>
		<pubDate>Sun, 04 May 2008 09:53:29 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=95#comment-2180</guid>
		<description><![CDATA[Another thing worth trying: from the nature of NLP tasks, a small core of the predictive features are very common, while most of the features are rare.  Maybe allocating different ranges for common and non-common features (making sure common features does not collide interntally and externally) could be beneficial to classification accuracy.  This has the possible (negligible?) cost of saving a table of ~300 entries for the common features.]]></description>
		<content:encoded><![CDATA[<p>Another thing worth trying: from the nature of NLP tasks, a small core of the predictive features are very common, while most of the features are rare.  Maybe allocating different ranges for common and non-common features (making sure common features does not collide interntally and externally) could be beneficial to classification accuracy.  This has the possible (negligible?) cost of saving a table of ~300 entries for the common features.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kuzman Ganchev</title>
		<link>http://lingpipe-blog.com/2008/05/02/feature-hash-code-collisions-in-linear-classifiers/#comment-2170</link>
		<dc:creator><![CDATA[Kuzman Ganchev]]></dc:creator>
		<pubDate>Sat, 03 May 2008 03:26:19 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=95#comment-2170</guid>
		<description><![CDATA[For what it&#039;s worth, we tried adding multiple differently hashed copies of the features in some preliminary experiments.  We didn&#039;t see any improvement over single copies (I think there was a small consistent decrease in performance).  I suspect that this is because we were using a relatively small number of slots (at most twice as many as the number of features).  Certainly, if m=n, then adding sufficiently many copies will eventually make classification impossible, since all the features will be mixed with each other.  Maybe if there are many more slots than features, then adding multiple copies of features would help, since then most of the second copies will land in empty bins.]]></description>
		<content:encoded><![CDATA[<p>For what it&#8217;s worth, we tried adding multiple differently hashed copies of the features in some preliminary experiments.  We didn&#8217;t see any improvement over single copies (I think there was a small consistent decrease in performance).  I suspect that this is because we were using a relatively small number of slots (at most twice as many as the number of features).  Certainly, if m=n, then adding sufficiently many copies will eventually make classification impossible, since all the features will be mixed with each other.  Maybe if there are many more slots than features, then adding multiple copies of features would help, since then most of the second copies will land in empty bins.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

