I suggest you actually try to calibrate some real naive Bayes output. What you’ll find is that all the predictions go to 0.999 and 0.001 if you have 100+ word documents and you get really strong overdispersion effects (which logistic regression also can’t correct—you need to transform word vectors to do that or build a proper negative-binomial type model along the lines of the 1964 Bayesian classifier of Mosteller and Wallace!).

You can perform heuristic inference with a Bayesian mean just as well as a maximum likelihood estimate. You’ll find the Bayesian approach a bit more robust, but you can probably get most of the way there without much loss by regularizing heavily.

P.S. My second paper on stats (circa 1998) used exactly this kind of post-calibration on an SVD-reduced representation of word vectors.

]]>Thanks for the suggestion with the hierarchical prior, I was hoping for a method that doesn’t require sampling, because prediction speed is crucial in my case. The variational approximations are also not that fast as far as I know?

]]>No, you cannot post-calibrate.

I’d suggest something like a logistic regression with a hierarchical prior. The problem there is that the posterior isn’t conjugate, so you need MCMC methods or Laplace approximations or variational approximations, none of which are implemented in LingPipe.

]]>If the posteriors are not calibrated then calibration techniques can be used in order to do so and obtain meaningful redible intervals, correct?

]]>Given the failure of the independence assumptions under naive Bayes, the posteriors will **NOT** be well calibrated. So there’s not much use in computing them.

I think your approach is neater, while my solution being more general. Basically, it can convert any recursive function into this style without any thinking.

]]>