Yes, the variational Bayes option in Stan uses a generic multivariate normal variational approximation. And it’s batch. I’d recommend Matt Hoffman’s highly scalable stochastic variational inference implementation in Vowpal Wabbit. It uses a custom variational approximation to LDA which should also be better than Stan’s generic multivariate normal approximation. LingPipe’s Monte Carlo implementation will be pretty fast, too. All of these methods will only find a single mode, so if you re-run with new random seeds, you’ll get different answers.

]]>with experiments, and I think is very cool, is “A Practical Algorithm for Topic Modeling with Provable Guarantees” ]]>

No, I should ask Shay about it. He’s teaching a Bayesian NLP seminar this semester and the paper of yours and Sharon Goldwater’s on morphology is next on the reading list!

]]>I’ve been seeing a lot of these spectral methods for mixture models (clustering) lately. Michael Collins and crew have been applying them to PCFGs and Hsu and crew have been doing HMMs.

I didn’t see a quick summary of their separability conditions, but claim to be relaxing a requirement that there be anchor words that only show up in a single topic. I’m surprised even distinct anchor words is enough separability. These results are pretty amazing.

Of course, they still don’t give you full Bayesian inference over LDA.

]]>See for example this recent paper, by Anandkumar et al.:

]]>