Update: It’s still Wednesday, but the right day is 16 December.
This Wednesday, I (Bob) will be giving a talk at the brand-spanking new Knapp Center for Biomedical Discovery at the University of Chicago.
2 PM, Wed 16 Dec 2009
Knapp Center for Biomedical Discovery (KCBD)
10th Floor, South Conference Room
900 East 57th Street
University of Chicago
Multilevel Models of Coding and Diagnosis
with Multiple Tests and No Gold Standards
Bob Carpenter, Alias-i
I’ll introduce multilevel generalizations of some well known models drawn from the epidemiology literature and evaluate their fit to diagnostic testing and linguistic annotation data. The analogy is that data annotation for machine learning (or, e.g., ICD-9 coding) is the same kind of process as diagnostic testing.
The observed data consists of multiple testers supplying results for multiple tested units. True labels for some units may be known (perhaps with selection bias), and not all units need to undergo every test.
In all models, there are parameters for outcome prevalence, the result for each unit, and some form of test accuracy. I’ll also consider models with parameters for individual test accuracy and bias (equivalently sensitivity and specificity in the binary case) and item difficulty/severity.
I’ll focus on the priors for annotator accuracy/bias and item difficulty, showing how diffuse hyperpriors allow them to be effectively inferred along with the other parameters using Gibbs sampling. The posterior samples may be used for inferences for diagnostic precision, multiple comparisons of test accuracies, population prevalence, unit-level labels, etc.
I’ll show that the resulting multilevel models can be fit using data simulated according to the model. I will then fit the model to a range of clinical and natural language data. I’ll discuss their advantages for inference with epidemiology data ranging from dentists diagnosing caries based on x-rays, oncologists diagnosing tumors based on slides, infection diagnosis based on exams and serum tests, and with natural language data including name spotting, word stemming, and classification.
I’ll conclude by discussing extensions to further pooling through random effects for different testing facilities, different kinds of annotators (e.g. doctors vs. ICD-9 specialists), different kinds of subjects/units (e.g. genetic predisposition to diseases, or articles drawn from different journals), etc.
All the software (Java, R, BUGS) and data discussed in this talk are freely available from the LingPipe sandbox in the hierAnno project.
You may already be familiar with all this from the data annotation thread on this blog.