Here’s a link to the slides of the talk I presented recently at ICML:
- Carpenter, Bob. 2011. Sampling, Modeling and Measurement Error in Inference from Clinical Text. Invited talk at the ICML 2011 Workshop on Learning from Unstructured Clinical Text. Bellevue, Washington, USA.
It’s basically a list of the kinds of things that can go wrong and introduce error (bias and noise) into inferences. Although the examples are mostly clinical (with one on baseball and one on cancer clusters), the point is generally applicable.
Small, Focused Workshops
I really like small, focused workshops, and this one was very good, with lots of presentations on people’s practical experiences launching systems in hospitals and working on fascinating text mining problems from clinical notes.
Thanks to the Organizers
Thanks again to the organizers, especially Faisal Farooq, who handled all the paperwork. It’s a pretty thankless job in my experience, but having done it myself, I can really appreciate how much work it is to run something that comes off smoothly.
I don’t know how long the page will last, but here’s a link to the workshop itself:
Unintended (Beneficial) Consequences
When Noémie Elhadad invited me to give a talk, I met with her to see if there was a topic I could talk about. During that meeting, she mentioned how hard it had been to hire an NLP programmer in biomedical informatics (it’s just as hard if not harder at a small company). The upshot is that Mitzi got a new job at Columbia into the bargain. In a way, it’s too bad, because I miss talking to Mitzi about her work in genomics, about which I know relatively little compared to NLP.
August 18, 2011 at 7:41 pm |
Any chance there is some video around of this talk? The content looks excellent. I’d love more explanation to understand it better.
August 23, 2011 at 5:49 pm |
Sorry, but the session wasn’t videotaped. You can find discussions of sampling error in any stats textbook (my first example came from Gelman et al.’s Bayesian Data Analysis). Texts on survey sampling seem to treat the problem most generally. You’ll also find measurement error models in survey sampling books.
Model specification error is the dirty little secret of both Bayesian and frequentist stats. You often see this tested (for instance, generating data with one model and fitting with another), but rarely discussed in any generality.