Thanks for the ref — I hope it defines Krylov spaces!

Maybe I’m not understanding the issue, but for L1 priors, the generative model’s the same, only with a Laplace prior in place of the Gaussian. You can use any probability distribution over vectors at this point in the model.

I have no idea if you could set it up as an entropic prior.

Elastic net post coming up tomorrow.

]]>> methods like L-BFGS are popular in computational linguistics

That’s the kind of thing I was going to suggest, so I’m glad I asked.

> they forego direct computational of the inverted matrix of Hessians

Suddenly I’m worried that I misunderstood and that you need the full inverted matrix rather than its application. If so, my apologies, and I have nothing new to suggest.

In any case, since you are a fan of MacKay, you might like this Krylov space technique by Mark Gibbs, a student of his:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8224

which handles the termination criterion nicely. I’ve used it successfully on sparse problems to optimize hyperparameters. (The variance, among other things.) You could modify it to generate (approximate) samples from the posterior. However, I love Monte Carlo too, and that’s probably the way to go.

I’ve also considered using L1 priors, but shied away from it because I wasn’t sure what the corresponding generative model or maximum entropy principle is.

Looking forward to your post on elastic net prior.

]]>Just this past weekend, I was reading about Radford Neal’s Hamiltonial Monte Carlo in MacKay’s *Information Theory, Inference, and Learning Algorithms*, Section 41.4 (I love this book’s frank and fun discussions of methods). Not surprisingly, given the name, it uses sampling to estimate posterior (co)variance.

Interesting discussion. I’m reading your post more than a year late, but it occurred to me to comment on your last paragraph that there are good inversion techniques for huge sparse matrices. Some are used in Bayesian methods, unsurprisingly, for variance estimation in Gaussian processes.

If this post is still alive, then I’d be happy to cite some algorithms. (Well, Krylov space ones are particularly appealing.)

All the best.

]]>