<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Quantized Information Gain (Conditional Entropy) for Real- and Count-Valued Features</title>
	<atom:link href="http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Text Classification for Sentiment Analysis &#8211; Precision and Recall &#171;streamhacker.com</title>
		<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/#comment-6889</link>
		<dc:creator><![CDATA[Text Classification for Sentiment Analysis &#8211; Precision and Recall &#171;streamhacker.com]]></dc:creator>
		<pubDate>Mon, 17 May 2010 14:50:56 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1432#comment-6889</guid>
		<description><![CDATA[[...] and only classify using sentiment rich words. This is usually done using the concept of information gain, aka mutual information, to improve feature selection, which I&#039;ll also explore in a future [...]]]></description>
		<content:encoded><![CDATA[<p>[...] and only classify using sentiment rich words. This is usually done using the concept of information gain, aka mutual information, to improve feature selection, which I&#039;ll also explore in a future [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/#comment-4811</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Tue, 02 Jun 2009 17:48:34 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1432#comment-4811</guid>
		<description><![CDATA[LingPipe implements L_1 regularization, even going so far as to allow varying means and variances per feature.  

The reasons to also use feature selection are (1) efficiency [stochastic gradient takes time proportional to number of non-zero features in the input], (2) accuracy [we&#039;ve found doing info gain and then fitting has better 0/1 accuracy than relying on L_1 to do both {yes, I know that opens another can of worms on not optimizing the evaluation metric directly}], and (3) not all of our feature-based classifiers are regression based [we want to use this for perceptrons, for K-nearest neighbors, etc.].]]></description>
		<content:encoded><![CDATA[<p>LingPipe implements L_1 regularization, even going so far as to allow varying means and variances per feature.  </p>
<p>The reasons to also use feature selection are (1) efficiency [stochastic gradient takes time proportional to number of non-zero features in the input], (2) accuracy [we've found doing info gain and then fitting has better 0/1 accuracy than relying on L_1 to do both {yes, I know that opens another can of worms on not optimizing the evaluation metric directly}], and (3) not all of our feature-based classifiers are regression based [we want to use this for perceptrons, for K-nearest neighbors, etc.].</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jose</title>
		<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/#comment-4809</link>
		<dc:creator><![CDATA[Jose]]></dc:creator>
		<pubDate>Tue, 02 Jun 2009 14:13:59 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1432#comment-4809</guid>
		<description><![CDATA[Why not just using L_1 regularization as a means for sparsity in Logistic Regression? :-)]]></description>
		<content:encoded><![CDATA[<p>Why not just using L_1 regularization as a means for sparsity in Logistic Regression? :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: santi</title>
		<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/#comment-4694</link>
		<dc:creator><![CDATA[santi]]></dc:creator>
		<pubDate>Sat, 16 May 2009 17:47:55 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1432#comment-4694</guid>
		<description><![CDATA[Yes, MDL works great as an stopping criterion. 

Sometimes it is easy to overfit if we try too hard when discretizing the features before applying IG (for example, allowing too many bins).  Another option that can work surprisingly well is to consider a single best-split-point approach, that is, to restrict ourselves to binary splits. One looks for the min-entropy split value for each feature by sorting the instances by that feature and measuring the entropy at the interesting points. Interesting points are those where there is a change of class, as it is obvious those are the only where the maximum IG can be found.]]></description>
		<content:encoded><![CDATA[<p>Yes, MDL works great as an stopping criterion. </p>
<p>Sometimes it is easy to overfit if we try too hard when discretizing the features before applying IG (for example, allowing too many bins).  Another option that can work surprisingly well is to consider a single best-split-point approach, that is, to restrict ourselves to binary splits. One looks for the min-entropy split value for each feature by sorting the instances by that feature and measuring the entropy at the interesting points. Interesting points are those where there is a change of class, as it is obvious those are the only where the maximum IG can be found.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Reid</title>
		<link>http://lingpipe-blog.com/2009/05/14/quantized-information-gain-for-real-count-valued-features/#comment-4686</link>
		<dc:creator><![CDATA[Mark Reid]]></dc:creator>
		<pubDate>Fri, 15 May 2009 04:54:17 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1432#comment-4686</guid>
		<description><![CDATA[Hi,

I briefly looked into the same problem a few years ago and tried out a number of  different techniques. The most appealing to me were the entropy-based dynamic methods which essentially selected a number of bins and their thresholds based on a kind of minimum description length principle.

I can&#039;t remember the exact method I used but a quick search using terms like &quot;entropy&quot; and &quot;discretization&quot; found the following papers which look vaguely familiar: &lt;a href=&quot;http://dx.doi.org/10.1016/j.cor.2005.01.022&quot; rel=&quot;nofollow&quot;&gt;Evaluating the performance of cost-based discretization versus entropy- and error-based discretization&lt;/a&gt; and &lt;a href=&quot;https://www.aaai.org/Papers/KDD/1996/KDD96-019.pdf&quot; rel=&quot;nofollow&quot;&gt;Error-based and entropy-based discretization of continuous features&lt;/a&gt;. The categories of discretization in the conclusions of the latter paper are useful.

I also stumbled across this survey which may have some more pointers: &lt;a href=&quot;http://www.math.upatras.gr/~esdlab/en/members/kotsiantis/discretization%20survey%20kotsiantis.pdf&quot; rel=&quot;nofollow&quot;&gt;Discretization techniques: a recent survey&lt;/a&gt;.

Hopefully some of that may help.]]></description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I briefly looked into the same problem a few years ago and tried out a number of  different techniques. The most appealing to me were the entropy-based dynamic methods which essentially selected a number of bins and their thresholds based on a kind of minimum description length principle.</p>
<p>I can&#8217;t remember the exact method I used but a quick search using terms like &#8220;entropy&#8221; and &#8220;discretization&#8221; found the following papers which look vaguely familiar: <a href="http://dx.doi.org/10.1016/j.cor.2005.01.022" rel="nofollow">Evaluating the performance of cost-based discretization versus entropy- and error-based discretization</a> and <a href="https://www.aaai.org/Papers/KDD/1996/KDD96-019.pdf" rel="nofollow">Error-based and entropy-based discretization of continuous features</a>. The categories of discretization in the conclusions of the latter paper are useful.</p>
<p>I also stumbled across this survey which may have some more pointers: <a href="http://www.math.upatras.gr/~esdlab/en/members/kotsiantis/discretization%20survey%20kotsiantis.pdf" rel="nofollow">Discretization techniques: a recent survey</a>.</p>
<p>Hopefully some of that may help.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

