Ethics, logistic regression, and 0-1 loss


Andrew Gelman and David Madigan wrote a paper on why 0-1 loss is so problematic:

This is related to the issue of whether one should be training on an artificial gold standard. Suppose we have a bunch of annotators and we don’t have perfect agreement on items. What do we do? Well, in practice, machine learning evals tend to either (1) throw away the examples without agreement (e.g., the RTE evals, some biocreative named entity evals, etc.), or (2) go with the majority label (everything else I know of). Either way, we are throwing away a huge amount of information by reducing the label to artificial certainty. You can see this pretty easily with simulations, and Raykar et al. showed it with real data.

Yet 0/1 corpora and evaluation remain the gold standard (pun intended) in most machine learning evals. Kaggle has gone largely to log loss, but even that’s very flat around the middle of the range, as Andrew discusses in this blog post:

The problem is that it’s very hard to train a model that’s well calibrated if you reduce the data to an artificial gold standard. If you don’t know what I mean by calibration, check out this paper:

It’s one of my favorites. Once you’ve understood it, it’s hard not to think of evaluating models in terms of calibration and sharpness.

Tags: , , ,

2 Responses to “Ethics, logistic regression, and 0-1 loss”

  1. Carl Thomé Says:

    The link to Gelman’s blog post is broken (at least for me).

  2. Bob Carpenter Says:

    Thanks for the heads up. I fixed the link because it’s well worth reading.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s