<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Dasgupta and Hsu (2008) Hierarchical Sampling for Active Learning</title>
	<atom:link href="http://lingpipe-blog.com/2009/06/17/dasgupta-and-hsu-2008-hierarchical-sampling-for-active-learning/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/06/17/dasgupta-and-hsu-2008-hierarchical-sampling-for-active-learning/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/06/17/dasgupta-and-hsu-2008-hierarchical-sampling-for-active-learning/#comment-4932</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Tue, 23 Jun 2009 19:59:12 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1600#comment-4932</guid>
		<description><![CDATA[I just found this paper, which is again hinting at the same kind of approach, but not quite getting there:
 
Jingbo Zhu,  Huizhen Wang,  Tianshun Yao, and   Benjamin K Tsou. 2008. &lt;a href=&quot;http://www.aclweb.org/anthology-new/C/C08/C08-1143.pdf&quot; rel=&quot;nofollow&quot;&gt;Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification&lt;/a&gt;.  In &lt;i&gt;COLING&lt;/i&gt;.

They use KNN-density estimation to try to find outliers, but rather than sampling like k-means++ take the K-most extreme.]]></description>
		<content:encoded><![CDATA[<p>I just found this paper, which is again hinting at the same kind of approach, but not quite getting there:</p>
<p>Jingbo Zhu,  Huizhen Wang,  Tianshun Yao, and   Benjamin K Tsou. 2008. <a href="http://www.aclweb.org/anthology-new/C/C08/C08-1143.pdf" rel="nofollow">Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification</a>.  In <i>COLING</i>.</p>
<p>They use KNN-density estimation to try to find outliers, but rather than sampling like k-means++ take the K-most extreme.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/06/17/dasgupta-and-hsu-2008-hierarchical-sampling-for-active-learning/#comment-4927</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Tue, 23 Jun 2009 02:56:48 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1600#comment-4927</guid>
		<description><![CDATA[I really didn&#039;t have the Bootstrap-LV algorithm of Saar-Tsechansky and Provost in mind -- that samples based on the variance among a bunch of classifiers.  I &lt;b&gt;did&lt;/b&gt; mean what you&#039;re calling BootM (M is for &quot;margin&quot; based), which you attribute to Melville and Mooney et al.

I was concerned with the issue you bring up on p 102 (real page, not PDF page), citing Xiao et al., about weight sampling.  That&#039;s exactly the balance k-means++ is suposed to get right.  K-means++ samples the next cluster centroid with a probability proportional to squared Euclidean distance to the closest centroid.  In a generative model, that could be sampling an example proportional to something like minimum joint probability of category and example.  With a spherical Gaussian classifier with uniform distribution over categories, k-means++ is doing something similar.

I&#039;m not sure what you&#039;d do in a discriminative model to get the right distribution.  I was hoping someone would have a nice evaluation of a bunch of these ideas somewhere.

To emphasize outliers, can&#039;t you just raise whatever the metric is to some power -- just the opposite of the usual trick to deemphasize them in annealing?  In the limit, you just get the approach where you choose the most uncertain example each time, which overemphasizes outliers for most applications.]]></description>
		<content:encoded><![CDATA[<p>I really didn&#8217;t have the Bootstrap-LV algorithm of Saar-Tsechansky and Provost in mind &#8212; that samples based on the variance among a bunch of classifiers.  I <b>did</b> mean what you&#8217;re calling BootM (M is for &#8220;margin&#8221; based), which you attribute to Melville and Mooney et al.</p>
<p>I was concerned with the issue you bring up on p 102 (real page, not PDF page), citing Xiao et al., about weight sampling.  That&#8217;s exactly the balance k-means++ is suposed to get right.  K-means++ samples the next cluster centroid with a probability proportional to squared Euclidean distance to the closest centroid.  In a generative model, that could be sampling an example proportional to something like minimum joint probability of category and example.  With a spherical Gaussian classifier with uniform distribution over categories, k-means++ is doing something similar.</p>
<p>I&#8217;m not sure what you&#8217;d do in a discriminative model to get the right distribution.  I was hoping someone would have a nice evaluation of a bunch of these ideas somewhere.</p>
<p>To emphasize outliers, can&#8217;t you just raise whatever the metric is to some power &#8212; just the opposite of the usual trick to deemphasize them in annealing?  In the limit, you just get the approach where you choose the most uncertain example each time, which overemphasizes outliers for most applications.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken Dwyer</title>
		<link>http://lingpipe-blog.com/2009/06/17/dasgupta-and-hsu-2008-hierarchical-sampling-for-active-learning/#comment-4923</link>
		<dc:creator><![CDATA[Ken Dwyer]]></dc:creator>
		<pubDate>Mon, 22 Jun 2009 17:30:57 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1600#comment-4923</guid>
		<description><![CDATA[If I understand your idea correctly, the Bootstrap-LV algorithm in the Provost paper you cite does something similar to what you suggest. It samples proportional to the variance in the class probability estimate for a given example.

However, this strategy seems to degrade when the (prior) class distribution is highly skewed, since the total weight of the minority class examples, which have high variance/interestingness &#039;scores&#039;, may be dwarfed by the weight of the majority class examples due to their sheer numbers. A potential outcome is that you select very few minority class examples. An illustration of this phenomenon is given in my M.Sc. thesis, pp. 102-104.

http://www.cs.ualberta.ca/~dwyer/files/msc_thesis.pdf

The recent paper &quot;Importance-weighted active learning&quot; by Bygelzimer, Dasgupta, and Langford is on my to-read list, and may propose a more robust solution.]]></description>
		<content:encoded><![CDATA[<p>If I understand your idea correctly, the Bootstrap-LV algorithm in the Provost paper you cite does something similar to what you suggest. It samples proportional to the variance in the class probability estimate for a given example.</p>
<p>However, this strategy seems to degrade when the (prior) class distribution is highly skewed, since the total weight of the minority class examples, which have high variance/interestingness &#8216;scores&#8217;, may be dwarfed by the weight of the majority class examples due to their sheer numbers. A potential outcome is that you select very few minority class examples. An illustration of this phenomenon is given in my M.Sc. thesis, pp. 102-104.</p>
<p><a href="http://www.cs.ualberta.ca/~dwyer/files/msc_thesis.pdf" rel="nofollow">http://www.cs.ualberta.ca/~dwyer/files/msc_thesis.pdf</a></p>
<p>The recent paper &#8220;Importance-weighted active learning&#8221; by Bygelzimer, Dasgupta, and Langford is on my to-read list, and may propose a more robust solution.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

