The tape’s a huge memory sink. I don’t see how else you can do reverse-mode, though.

We’re about ready to release the first version of our HMC sampler with auto-dif. We wound up writing our own auto-dif that’s both faster (not by much) and more flexible (by a fair bit) than either RAD/Sacado or CppAD. We based in on an extensible OO design that overrides operator new to manage the auto-dif variables on a stack.

It’s very special purpose, though. We also didn’t template values out, so there’s not a good way to do second-order difs. But reverse-reverse isn’t ideal, at least the way any of the standard packages go about it. For second-order (i.e., Hessians), it’d be better to do N forward-reverse mode passes.

We went with Eigen, whose developers have been really helpful. For Eigen 3.0.1, they removed explicit namespaces on some of their internals, which allows all of the functions to be used (we use Cholesky, eigenvalues/vectors, determinants and inverses, as well as all the basic arithmetic). It all works very well **if all the variables are auto-dif variables**.

There’s no way in Eigen to deal with multiplying a vector of doubles by a matrix of auto-dif variables. I’ve put in a request and they’ve sketched a solution for mixed-type operations, but no one’s implemented it and it’s not something that’s a high priority for the developers. (They specially hard-coded the complex-double case.)

For now, we’re just wrapping our double matrices in auto-dif variables operation by operation so we could template everything and easily code generate (we’re generating code from model specs like BUGS). Wrapping is a huge waste of space (and hence time), but it does the right thing in terms of behavior. The real bummer is that the way I wrapped it, it defeats their clever template expressions. But the real speed gain will not be from laziness through templates, but by short-circuiting the auto-dif to make it more efficient (who knew so many matrix ops had analytical derivatives?).

Luckily, we don’t have too many matrix ops. They only show up in multivariate distros. It was a huge pain to get an unconstrained representation of correlation and covariance matrices. We used the Lewandowski, Kurowicka and Joe vines construction, which gives you (k choose 2) parameters for a correlation matrix and another k for a covariance matrix. I’m becoming quite adept at complex Jacobians (we also needed to finesse simplexes to k-1 unconstrained values and ordered scalars for ordinal logit cutpoints).

We also had to implement our own log probability functions. We needed something like normal(y,mu,sigma) to have any or all of y, mu and sigma be auto-dif variables. The standard libs just have a single template variable, which is again wasteful. We also added lots of other useful functions like log_sum_exp, softmax, log gamma, and so on.

]]>I’ve so far had good luck with ADOL-C. eigen is flexible enough that it doesn’t mind working with objects (adoubles) instead of integrals (doubles), and therefore integration between the two (ADOL-C and eigen) has not been an issue. ADOL-C’s major drawback is its large memory footprint (I force it to save the tape in memory rather than on disk), but I recently found a patch at (http://www.sc.rwth-aachen.de/willkomm/) that replaces ADOL-C’s memory manager with something a bit more modern. The site I linked to also has a presentation (PDF) benchmarking a few AD packages which might be of interest.

As for the performance hit that comes with using a tape, I’ve noticed that ADOL-C, and perhaps the operator-overloading / tape method in general (I’m not a computer scientist), has a high fixed cost in terms of speed, but relatively low cost in terms of calculating additional higher-order tensors. For example, I’m using Girolami/Calderhead’s MALA, which relies on the third-derivative of log probability function. G&C’s paper includes an approximation (constant curvature) that only requires the Fisher Information matrix, so I implemented that first. However, there appears to be very little speed difference between stopping at the second derivative and going all the way to full MALA.

Any other success stories out there?

]]>eigen is awesome, but linking eigen to AD packages has proved to be very tricky (even their unsupported/AutoDiff package). Do you have any updates about which set of tools you settled on? I’m eager to find something that can play nicely with eigen.

]]>CppAD’s notion of “taping” seems really clunky. Isn’t it going to be very inefficient? Here’s what it looks like in CppAD:

CppAD reverse automatic differentation example

I really like the clean integration of operator overloading in Sacado:

Sacado reverse automatic differentiation example

But then how inefficient is the operator overloading in Sacado and other such packages? I just don’t know C++ that well.

The function is cleanly separated from any of the Sacado bits. Maybe that’s achievable with CppAD with a little encapsulation — I’m still not sure exactly how all these things work.

It makes me wonder if I can just install something like the templated Boost C++ libraries and be done with it.

Or if I can just pass in a vector to forward mode, like you see in the Sacado forward example Fad_demo.

Installing this stuff’s such a pain compared to Java — it’s like the 1980s all over again with path-guessing and scripts to configure scripts to build platform-specific executables.

Linking with R isn’t the problem. We (and by that I mean the people I’m working with) are all over that. We just don’t want to spend our lives hand-coding derivatives.

]]>The term “imputation” is usually used for missing data, such as a non-response in a survey. You can certainly set up a Bayesian model to do multilevel multiple imputation and then use Hamiltonian MC to sample from the posterior. In fact, it’s one of the applications we’re exploring. There’s a nice discussion in Gelman and Hill’s multilevel regression book, with BUGS used for sampling (it’s a general Gibbs sampler).

]]>