In a previous post, I summarized the workshop for the i2b2 Obesity Challenge. I’ve finally uploaded our presentation:

- Carpenter, Bob, Breck Baldwin, Carlos Cano, and Leon Peshkin. 2008. Known Unknowns in Discharge Summary Mining. Talk presented to the AMIA Second i2b2 Shared-Task and Workshop Challenges in Natural Language Processing for Clinical Data. Obesity Challenge (A Shared-Task on Obesity): Who’s obese and what co-morbidities do they (definitely/likely) have? Washington, D.C.

I tried to explain how we’d misunderstood the task definition. Let’s consider just the textual task, and just the congestive-heart-failure co-morbidity. The task was to classify patient discharge summaries (dozens to hundreds of semi-structured sentences about a patient and their history, meds, and course of treatment) into one of four categories: (YES) the text stated the person had CHF, (NO) the text stated the person did not have CHF, (QUESTIONABLE) the text stated that it was questionable whether a person had CHF, and (UNKNOWN) the text said nothing about the patient’s CHF status. The intuitive form of the CHF task removed the UNKNOWN and asked the annotators to use their best judgement rather than requiring explicit statements in the text (annotators disagreed on how explicit things had to be to count as evidenced by the text).

The organizers argued that the key to the eval was getting the questionable categories right (I think that’s the “likely” in the title), because they were important. I somehow think that if patients with questionable disease statuses were so important, something would go in their chart to that effect. As is, there were 39 questionable cases (across all 16 co-morbidities) in 11,630 cases. Clearly the clinicians didn’t feel questionable conclusions (or for that matter, negative conclusions, of which there were only 87 instances across all 16 co-morbidities) were worth recording.

The point of my talk (after describing how we used regularized sparse logistic regression with some clever feature extraction for drug names and negation by Carlos and Leon), was more philosophical (and doesn’t necessarily represent the views of my co-authors, who didn’t have a chance to review my “late-breaking” presentation).

To my way of thinking, the task is really a quantization of four possible states of posterior information. But first, let’s look at the prior, which summarizes our uncertainty about the population distribution p(CHF) for congestive heart failure (I’m just making these distributions up from simple betas — they’re not intended to be realistically shaped).

What I’d like to do is gather some data, like a discharge summary Text, and infer a posterior p(CHF|Text). For instance, the following posteriors are likely candidates to describe as NO and YES:

whereas the following would constitute an unknown case:

Note that the posterior p(CHF|Text) in the UNKNOWN case is the same as the prior p(CHF) indicating the text didn’t give us any information about CHF. On the other hand, the posterior p(CHF|Text) is both broad and centered near 0.5, indicating a high degree of uncertainty about whether the patient has CHF given the text.

While this view of information is quite pleasing theoretically, and seems promising for practical applications in that it’s easy to combine with other information, it remains a difficult paradigm to evaluate in a bakeoff situation.

December 3, 2008 at 9:34 pm |

Intuitively I see what you want to do, but I’m not sure it’s possible.

Can it actually happen in Bayesian inference that the posterior has broader tails than the prior? I don’t think this can happen with the Dirichlet-Multinomial, for instance. Because the likelihood can only add positive counts to the prior, the posterior has to be more peaked than the prior (maybe this only holds for alpha > 1?).

Hmm: I wonder if this is a general property of exponential families. So in order to get a posterior that is less peaked than the prior we’d need to find an exponential family where increasing the sufficient statistics by adding the log likelihood results in a less peaked distribution. I wonder if there are any such families of distributions?

December 4, 2008 at 2:41 pm |

That should’ve dawned on me when I generated the above graphs using a beta (2D Dirichlet), where the questionable posterior was beta(10,10) whereas the prior was beta(20,100). Doh!

Despite always having more certainty about the parameter, the certainty about the patient having CHF can still go down. For instance, a posterior of beta(120,120) is right at 50% mean, so is less sure than the prior about CHF in terms of entropy of CHF outcome, though more sure about where the parameter’s at.