Posts Tagged ‘calibration’

Ethics, logistic regression, and 0-1 loss

December 27, 2016

Andrew Gelman and David Madigan wrote a paper on why 0-1 loss is so problematic:

This is related to the issue of whether one should be training on an artificial gold standard. Suppose we have a bunch of annotators and we don’t have perfect agreement on items. What do we do? Well, in practice, machine learning evals tend to either (1) throw away the examples without agreement (e.g., the RTE evals, some biocreative named entity evals, etc.), or (2) go with the majority label (everything else I know of). Either way, we are throwing away a huge amount of information by reducing the label to artificial certainty. You can see this pretty easily with simulations, and Raykar et al. showed it with real data.

Yet 0/1 corpora and evaluation remain the gold standard (pun intended) in most machine learning evals. Kaggle has gone largely to log loss, but even that’s very flat around the middle of the range, as Andrew discusses in this blog post:

The problem is that it’s very hard to train a model that’s well calibrated if you reduce the data to an artificial gold standard. If you don’t know what I mean by calibration, check out this paper:

It’s one of my favorites. Once you’ve understood it, it’s hard not to think of evaluating models in terms of calibration and sharpness.