<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: EM Clustering, Fat Gaussians and Number of Clusters</title>
	<atom:link href="http://lingpipe-blog.com/2008/03/07/em-clustering-fat-gaussians-and-number-of-clusters/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2008/03/07/em-clustering-fat-gaussians-and-number-of-clusters/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2008/03/07/em-clustering-fat-gaussians-and-number-of-clusters/#comment-2028</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Thu, 20 Mar 2008 19:08:58 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=80#comment-2028</guid>
		<description><![CDATA[Wow.  Still not used to getting comments on the blog!

The issue about the number of clusters not mattering depends on what you&#039;re trying to do with the Bayesian analysis.   The problem in something like Gibbs is that you get exchangeability of latent topics, so those won&#039;t be stable.  The pairwise associations can be extracted (do entity x and y show up in the same cluster, or estimate likelihood of same given model), but that&#039;s a lot of data to aggregate over samples when we need to scale.  

But consider this case: estimating how many entity mentions there are in a corpus of text.  That&#039;s a case where you could do a Bayesian estimate of the number of clusters in a DP-prior setting.

- Bob]]></description>
		<content:encoded><![CDATA[<p>Wow.  Still not used to getting comments on the blog!</p>
<p>The issue about the number of clusters not mattering depends on what you&#8217;re trying to do with the Bayesian analysis.   The problem in something like Gibbs is that you get exchangeability of latent topics, so those won&#8217;t be stable.  The pairwise associations can be extracted (do entity x and y show up in the same cluster, or estimate likelihood of same given model), but that&#8217;s a lot of data to aggregate over samples when we need to scale.  </p>
<p>But consider this case: estimating how many entity mentions there are in a corpus of text.  That&#8217;s a case where you could do a Bayesian estimate of the number of clusters in a DP-prior setting.</p>
<p>- Bob</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aleks Jakulin</title>
		<link>http://lingpipe-blog.com/2008/03/07/em-clustering-fat-gaussians-and-number-of-clusters/#comment-2017</link>
		<dc:creator><![CDATA[Aleks Jakulin]]></dc:creator>
		<pubDate>Mon, 17 Mar 2008 19:59:49 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=80#comment-2017</guid>
		<description><![CDATA[Nice post! The importance of the Bayesian solution is that one does not KNOW the number of clusters - and that the number does not MATTER: what&#039;s important is to model the P(X&#124;C).

But in some cases, C is interesting: we are actually interested in creating a new concept - to be used for communication and other problems. If that&#039;s the intention, C has to be crisp.]]></description>
		<content:encoded><![CDATA[<p>Nice post! The importance of the Bayesian solution is that one does not KNOW the number of clusters &#8211; and that the number does not MATTER: what&#8217;s important is to model the P(X|C).</p>
<p>But in some cases, C is interesting: we are actually interested in creating a new concept &#8211; to be used for communication and other problems. If that&#8217;s the intention, C has to be crisp.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

