<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: How Can I Build a Classifier with no Negative Data?</title>
	<atom:link href="http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: todd.</title>
		<link>http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/#comment-2618</link>
		<dc:creator><![CDATA[todd.]]></dc:creator>
		<pubDate>Mon, 21 Jul 2008 22:37:52 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=110#comment-2618</guid>
		<description><![CDATA[Well, there was prize money. The contest ended in June. The final results are &lt;a href=&quot;http://mill.ucsd.edu/index.php?page=Results&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;. In 5th and 6th respectively for the semi-supervised task were groups called &quot;Cocorico&quot; and &quot;AllYourBayes.&quot; However, those two teams merged before the end of the contest. I was on AllYourBayes, but didn&#039;t work much on the semi-supervised task. I&#039;ll try to see if I can get the rest of the group to show up and talk about what worked and what didn&#039;t.]]></description>
		<content:encoded><![CDATA[<p>Well, there was prize money. The contest ended in June. The final results are <a href="http://mill.ucsd.edu/index.php?page=Results" rel="nofollow">here</a>. In 5th and 6th respectively for the semi-supervised task were groups called &#8220;Cocorico&#8221; and &#8220;AllYourBayes.&#8221; However, those two teams merged before the end of the contest. I was on AllYourBayes, but didn&#8217;t work much on the semi-supervised task. I&#8217;ll try to see if I can get the rest of the group to show up and talk about what worked and what didn&#8217;t.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/#comment-2606</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Fri, 18 Jul 2008 15:43:32 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=110#comment-2606</guid>
		<description><![CDATA[Awesome.  Thanks, Todd, and thanks &lt;a href=&quot;http://conflate.net/inductio/&quot; rel=&quot;nofollow&quot;&gt;Mark&lt;/a&gt; (I&#039;ve added &lt;a href=&quot;http://conflate.net/inductio/&quot; rel=&quot;nofollow&quot;&gt;your blog&lt;/a&gt; to our blog roll).

The SVM approach is interesting, though why the paper is called that is confusing because they conclude the neural network approach has the same performance and is more robust.  What I was planning was similar to Schölkopf&#039;s one-class SVM.  I&#039;m surprised their method of using outliers in the positive data (what I&#039;m evaluating right now) is more effective.  It reminds me of using entries on the n-best list as negative examples for sequence tagging.  Unfortunately, overall results are rather disappointing, being in the 50% range.  I should emphasize we&#039;d really like an approach that could balance precision and recall for different applications (for search we want high recall in the tail; for some other apps, we need high precision).

The problem posed for the &lt;a href=&quot;http://mill.ucsd.edu/index.php?page=Datasets&amp;subpage=Task2&quot; rel=&quot;nofollow&quot;&gt;2008 UC San Diego Data Mining Contest: Positive-Only Semi-Supervised Task&lt;/a&gt; is exactly what we&#039;re trying to do, though stated in the form of 20 real-valued features instead of raw texts.  What&#039;s amazing to me is that there are about 50 distinct entries on the &lt;a href=&quot;http://mill.ucsd.edu/index.php?page=Leaderboard&quot; rel=&quot;nofollow&quot;&gt;Leaderboard&lt;/a&gt;.  There&#039;s even &lt;a href=&quot;http://mill.ucsd.edu/index2.php?page=Prizes&quot; rel=&quot;nofollow&quot;&gt;prize money&lt;/a&gt; for &lt;a href=&quot;http://en.wikipedia.org/wiki/Parimutuel_betting&quot; rel=&quot;nofollow&quot;&gt;win, place and show&lt;/a&gt; associated with this contest.]]></description>
		<content:encoded><![CDATA[<p>Awesome.  Thanks, Todd, and thanks <a href="http://conflate.net/inductio/" rel="nofollow">Mark</a> (I&#8217;ve added <a href="http://conflate.net/inductio/" rel="nofollow">your blog</a> to our blog roll).</p>
<p>The SVM approach is interesting, though why the paper is called that is confusing because they conclude the neural network approach has the same performance and is more robust.  What I was planning was similar to Schölkopf&#8217;s one-class SVM.  I&#8217;m surprised their method of using outliers in the positive data (what I&#8217;m evaluating right now) is more effective.  It reminds me of using entries on the n-best list as negative examples for sequence tagging.  Unfortunately, overall results are rather disappointing, being in the 50% range.  I should emphasize we&#8217;d really like an approach that could balance precision and recall for different applications (for search we want high recall in the tail; for some other apps, we need high precision).</p>
<p>The problem posed for the <a href="http://mill.ucsd.edu/index.php?page=Datasets&amp;subpage=Task2" rel="nofollow">2008 UC San Diego Data Mining Contest: Positive-Only Semi-Supervised Task</a> is exactly what we&#8217;re trying to do, though stated in the form of 20 real-valued features instead of raw texts.  What&#8217;s amazing to me is that there are about 50 distinct entries on the <a href="http://mill.ucsd.edu/index.php?page=Leaderboard" rel="nofollow">Leaderboard</a>.  There&#8217;s even <a href="http://mill.ucsd.edu/index2.php?page=Prizes" rel="nofollow">prize money</a> for <a href="http://en.wikipedia.org/wiki/Parimutuel_betting" rel="nofollow">win, place and show</a> associated with this contest.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: todd.</title>
		<link>http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/#comment-2604</link>
		<dc:creator><![CDATA[todd.]]></dc:creator>
		<pubDate>Fri, 18 Jul 2008 07:20:30 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=110#comment-2604</guid>
		<description><![CDATA[A friend of mine did pretty well in a UCSD data mining contest with a similar problem. There you had to classify ~11k instances based on ~70k examples, only a handful of which were labeled, and all of the labels were positive. I believe he used some form of hierarchical clustering, where clusters with some number of labeled positive instances were declared positive. 

The problem solved is described here: http://mill.ucsd.edu/index.php?page=Datasets&amp;subpage=Task2, and I&#039;d be happy to get more information on the method if you&#039;re interested.]]></description>
		<content:encoded><![CDATA[<p>A friend of mine did pretty well in a UCSD data mining contest with a similar problem. There you had to classify ~11k instances based on ~70k examples, only a handful of which were labeled, and all of the labels were positive. I believe he used some form of hierarchical clustering, where clusters with some number of labeled positive instances were declared positive. </p>
<p>The problem solved is described here: <a href="http://mill.ucsd.edu/index.php?page=Datasets&#038;subpage=Task2" rel="nofollow">http://mill.ucsd.edu/index.php?page=Datasets&#038;subpage=Task2</a>, and I&#8217;d be happy to get more information on the method if you&#8217;re interested.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mdreid</title>
		<link>http://lingpipe-blog.com/2008/07/17/how-can-i-build-a-classifier-with-no-negative-data/#comment-2602</link>
		<dc:creator><![CDATA[mdreid]]></dc:creator>
		<pubDate>Fri, 18 Jul 2008 06:25:00 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=110#comment-2602</guid>
		<description><![CDATA[You could try using a &lt;a href=&quot;http://jmlr.csail.mit.edu/papers/volume2/manevitz01a/manevitz01a.pdf&quot; rel=&quot;nofollow&quot;&gt;one-class SVM&lt;/a&gt;.]]></description>
		<content:encoded><![CDATA[<p>You could try using a <a href="http://jmlr.csail.mit.edu/papers/volume2/manevitz01a/manevitz01a.pdf" rel="nofollow">one-class SVM</a>.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

