We know HBC pretty well. I spoke to Hal about it at length before starting the Stan project.

We’re pretty up on all the competition. More like what we’re doing are the recent Church extensions that do HMC. Here’s the original paper:

but I think there were more recent things at NIPS this year.

I also know the PyMC folks were working on similar HMC + auto-dif approaches.

Also, theano in Python is trying to do some of the same things:

]]>Thanks for the pointers! I am sure I would enjoy playing with some of these now :-).

]]>BUGS, OpenBUGS and JAGS all have roughly the same functionality as Stan. They use Gibbs sampling instead of Hamiltonian Monte Carlo (technically, BUGS uses adaptive rejection sampling within Gibbs and JAGS uses slice sampling within Gibbs). These can actually be faster than (adaptive) Hamiltonian MC in some cases, such as models where all the priors are conjugate. They’re all interpreted.

There are some other compiled versions. HBC and Passage, in particular, are both written in Haskell but compile to C++. Both have more limited forms of models than BUGS and their ilk. Stan’s a bit more expressive than BUGS in terms of what can be in a model.

Koller and Friedman’s book on graphical models goes over the kinds of algorithms that can be automated, including structural models we’re not considering. Bishop’s machine learning book also has a chapter on automatic graphical model algorithms.

If you want to see some slick automatic compilation for undirected graphical models with structure, check out McCallum et al.’s Factorie.

]]>Nope. HMC is just for continuous parameters.

Wherever practical, we just marginalize out discrete parameters. The Stan extension to BUGS makes it possible to do that in the model itself (examples to come in the manual, which I haven’t written yet).

Otherwise, we use exact Gibbs for discrete parameters with few outcomes and slice sampling for discrete parameters with a large or unbounded number of outcomes. But we haven’t thought too hard about the discrete case. In particular, we haven’t computed the Markov blanket in such a way as to make either of these operations at all efficient.

HMC also works best for unbounded parameters with tails that are not too light (lighter than the Gaussian). So we’ve done a whole lot of work transforming things like positive variables (variance/precision/deviation), bounded variables (probability, correlation), simplexes, and covariance matrices. It’s been calc and matrices 101 around here computing all the Jacobian determinants!

The best thing to read is Radford Neal’s chapter in the new *Handbook of MCMC* — it’s one of the sample chapters.

We’re using an adaptive version developed by Matt Hoffman, called the no-U-turn sampler. There’s an arxiv paper.

]]>Wow, this looks cool — I wish I could come to your presentation! I really must learn more about Hamiltonian MC. But until then, perhaps you can help me with a simple question: can Hamiltonian MC be used with discrete variables? (E.g., can you calculate the derivatives you need?).

Mark

]]>