http://nlp.stanford.edu/software/classifier.shtml

Like most applications in natural language processing, they allow different numbers of features per category.

The priors work exactly the same way, with one per dimension.

Although they don’t allow features varying by category, you can check out the BMR package and related papers for some uses of priors for natural language that vary by dimension:

http://www.bayesianregression.org/

For much earlier refs with applications in economics, check out my blog post:

]]>Cheers.

]]>It turns out that I rediscovered the “truncated stochastic gradient” method of Langford, Li and Zhang, for which there’s a NIPS and Arxiv paper:

http://books.nips.cc/papers/files/nips21/NIPS2008_0280.pdf

http://arxiv.org/abs/0806.4686

I thought I was just implementing the industry standard stochastic gradient for Laplace. In fact, I thought I was borrowing the idea from Genkin, Lewis and Madigan, but they said they were truncating Laplaces for other reasons, but that the truncated gradient idea surfaced even earlier in Zhang and Oles’ IR Journal paper:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.5553

I called it “clipped regularization” in the white paper (section 10.6).

]]>Could you give a few words on how the Laplace prior works on LingPipe? You mention you use gradient descent, but Laplace isn’t differentiable at every point? How did you work it out?

Thanks!

]]>