The issue about the number of clusters not mattering depends on what you’re trying to do with the Bayesian analysis. The problem in something like Gibbs is that you get exchangeability of latent topics, so those won’t be stable. The pairwise associations can be extracted (do entity x and y show up in the same cluster, or estimate likelihood of same given model), but that’s a lot of data to aggregate over samples when we need to scale.

But consider this case: estimating how many entity mentions there are in a corpus of text. That’s a case where you could do a Bayesian estimate of the number of clusters in a DP-prior setting.

– Bob

]]>But in some cases, C is interesting: we are actually interested in creating a new concept – to be used for communication and other problems. If that’s the intention, C has to be crisp.

]]>