<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Convexity of (Root) Mean Square Error, or Why Committees Won the Netflix Prize</title>
	<atom:link href="http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Viviane</title>
		<link>http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/#comment-18052</link>
		<dc:creator><![CDATA[Viviane]]></dc:creator>
		<pubDate>Sat, 04 Feb 2012 20:56:48 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=2552#comment-18052</guid>
		<description><![CDATA[Thank you Felipe. I have read your work. It is ineedd the first to my knowledge that uses the extended (also called Fisher&#8217;s non-central) hypergeometric distribution for IR. I found that sampling from extended hypergeometric becomes exponentially expensive as the sample size increases, and that there is ongoing research in statistics about .While researching on the use of the hypergeometric distribution for IR, I found a paper from Wilbur dating back to 1993! Wilbur models the vocabulary intersection between the query and a set of relevant documents using the central hypergeometric distribution. Little has been done since then, probably because the multinomial distribution is a good approximation to the hypergeometric for most IR scenarios, i.e., when the sample size (query) is cosiderably smaller than the population size (document). However, as we show in the paper, in the case of document-long queries, the multinomial approximation does not hold anymore, and the use of the &#8220;vanilla&#8221; hypergeometric distribution is required.]]></description>
		<content:encoded><![CDATA[<p>Thank you Felipe. I have read your work. It is ineedd the first to my knowledge that uses the extended (also called Fisher&#8217;s non-central) hypergeometric distribution for IR. I found that sampling from extended hypergeometric becomes exponentially expensive as the sample size increases, and that there is ongoing research in statistics about .While researching on the use of the hypergeometric distribution for IR, I found a paper from Wilbur dating back to 1993! Wilbur models the vocabulary intersection between the query and a set of relevant documents using the central hypergeometric distribution. Little has been done since then, probably because the multinomial distribution is a good approximation to the hypergeometric for most IR scenarios, i.e., when the sample size (query) is cosiderably smaller than the population size (document). However, as we show in the paper, in the case of document-long queries, the multinomial approximation does not hold anymore, and the use of the &#8220;vanilla&#8221; hypergeometric distribution is required.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Search terms and the flu: prefering complex models &#124; Ready-to-hand</title>
		<link>http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/#comment-5898</link>
		<dc:creator><![CDATA[Search terms and the flu: prefering complex models &#124; Ready-to-hand]]></dc:creator>
		<pubDate>Tue, 01 Dec 2009 07:54:32 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=2552#comment-5898</guid>
		<description><![CDATA[[...] One way of doing this is just taking a weighted average of the predictions of several simpler models. This works quite well when your measure of the value of your model is root mean squared error (RMSE)... [...]]]></description>
		<content:encoded><![CDATA[<p>[...] One way of doing this is just taking a weighted average of the predictions of several simpler models. This works quite well when your measure of the value of your model is root mean squared error (RMSE)&#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/#comment-5561</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Wed, 30 Sep 2009 16:11:02 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=2552#comment-5561</guid>
		<description><![CDATA[@Max  Good point about convexity and ranking.  Non-convex error functions make optimization-based estimation (training) difficult, too.  

I don&#039;t think the kind of non-convexity you&#039;d get from top N or F-measure like scores would necessarily mean committee voting wouldn&#039;t work for these kinds of tasks.  

I think precision-at-N, where &quot;relevant&quot; means a customer rents it, is a better measure for Netflix&#039;s task.  

Furthermore, I think we need more results diversity -- I don&#039;t want it to recommend seven seasons of the Simpsons even if I will eventually rent them all and rank them highly.  And I don&#039;t want to keep being shown the same movie again and again as a recommendation, so there needs to be history.

I would also like Netflix and Amazon and PubMed to go deeper into more-like-this, because I find it useful for exploration.  

Netflix&#039;s own Cinematch system, which provided the baseline, was also based on regression.  From what Netflix has said, they&#039;re already incorporating some of the techniques from the competition. Certainly I&#039;d think the features would port.]]></description>
		<content:encoded><![CDATA[<p>@Max  Good point about convexity and ranking.  Non-convex error functions make optimization-based estimation (training) difficult, too.  </p>
<p>I don&#8217;t think the kind of non-convexity you&#8217;d get from top N or F-measure like scores would necessarily mean committee voting wouldn&#8217;t work for these kinds of tasks.  </p>
<p>I think precision-at-N, where &#8220;relevant&#8221; means a customer rents it, is a better measure for Netflix&#8217;s task.  </p>
<p>Furthermore, I think we need more results diversity &#8212; I don&#8217;t want it to recommend seven seasons of the Simpsons even if I will eventually rent them all and rank them highly.  And I don&#8217;t want to keep being shown the same movie again and again as a recommendation, so there needs to be history.</p>
<p>I would also like Netflix and Amazon and PubMed to go deeper into more-like-this, because I find it useful for exploration.  </p>
<p>Netflix&#8217;s own Cinematch system, which provided the baseline, was also based on regression.  From what Netflix has said, they&#8217;re already incorporating some of the techniques from the competition. Certainly I&#8217;d think the features would port.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Max Gubin</title>
		<link>http://lingpipe-blog.com/2009/09/29/convexity-of-root-mean-square-error-or-why-committees-won-the-netflix-prize/#comment-5529</link>
		<dc:creator><![CDATA[Max Gubin]]></dc:creator>
		<pubDate>Wed, 30 Sep 2009 05:40:16 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=2552#comment-5529</guid>
		<description><![CDATA[Actually, the situation with a real recommendation system is more complex because it can provide only a limited number of recommendations for a user, in other words, it provides only N top items with the best scores for a user. Any loss function that uses such a top window is extremely non-convex with many local minima (like loss functions in IR: P@N, NDCG). Popular Netflix methods like SVD and stochastic gradient descent won’t work well there. I won’t be surprised if people run into this problem when they try applying Netflix competition results in real systems.]]></description>
		<content:encoded><![CDATA[<p>Actually, the situation with a real recommendation system is more complex because it can provide only a limited number of recommendations for a user, in other words, it provides only N top items with the best scores for a user. Any loss function that uses such a top window is extremely non-convex with many local minima (like loss functions in IR: P@N, NDCG). Popular Netflix methods like SVD and stochastic gradient descent won’t work well there. I won’t be surprised if people run into this problem when they try applying Netflix competition results in real systems.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

