OMOP Cup: Drug Safety Surveillance Bakeoff


[Update: 13 October 2009: David Madigan says they got the wrong legalese the first go round and are going to replace it.]

David Madigan (co-lead of BMR) and crew, just announced the 2009/2010 OMOP Cup, a short bakeoff aimed at predicting which drugs cause which conditions in patients.

The Coarse Print

There’s no way we’d enter, given the ridiculously restrictive OMOP Cup rules.

The official rules only mention source in passing and only suggest you send them a tech report. But the home page says you have to “share your method”.

In addition, you have to transfer the rights to the contest organizers, which means you wouldn’t even own the rights to use your own technique after submitting it! If that’s not enough, the rules also require you to indemnify the organizers against damages (e.g. when a patent troll sues them because they think your entry infringed their patent).

If you work for a company or for a university, you may not even have the right to reassign your intellectual property this way.

The Two Challenges

There are two predictive challenges, based on the same underlying training data. For full details, see the site above and the challenge overviews:


Data’s out now, progress prizes ($5K total) will be awarded end of November 2009, and grand prizes ($15K total) at the end of March 2010.

Simulated Training Data

Unfortunately, they’re using simulated data. For what it’s worth, here’s OMOP’s call for simulations, so you can figure out some of the basics of how they were planning to simulate.

The basic data sizes are bigger than for Netflix, consisting of roughly:

  • 5K drugs
  • 5K conditions
  • 10M persons
  • 300M condition occurrences over 10 years
  • 90M drug exposures over 10 years
  • 4K positive, 4K negative associations (labeled training data)

The basic observational training data is organized into four DB table dumps:

  • Conditions: start date, person, condition

  • Drug Exposure: start date, end date, person, drug

  • Person Observations: start date, end date, person id, person status (alive/dead), prescription data (yes/no)

  • Person Data: id, birth year, gender (M/F), race (white/non-white)

The labeled training data contains 4000 examples of positive drug-condition associations and 4000 examples of negative associations.

I have no idea if the drug and condition data link to real-world drugs and conditions, though the challenge indicates they want you to use outside data, so they probably do (I’ll post any links that people send me about ways to use this data). OMOP’s common data model (CDM) specification is huge, and all you get in the data files are numerical codes.

System Output

For Challenge 1 ($10K grand prize), you just provide a score for each drug/condition pair, and the scores are only used for ranking.

For Challenge 2 ($5K grand prize), there’s a time component I didn’t understand, and they’re only using the first 500 of the 5000 drugs.


For the first bakeoff, it’s just average precision (see LingPipe’s ScoredPrecisionRecall class documentation or the task descriptions linked above for an explanation of (mean) average precision).

For the second bakeoff, it’s mean average precision, where means are over years.

Leader Board

They’re supposed to have a leaderboard and a way of evaluating responses online as Netflix did. So far, I don’t see it on their site.

What’s OMOP?

OMOP is the Observational Medical Outcomes Partnership, a “public-private partnership”, the stated goal of which is to “improve the monitoring of drugs for safety”.

2 Responses to “OMOP Cup: Drug Safety Surveillance Bakeoff”

  1. lingpipe Says:

    David Madigan commented on Andrew’s blog that they got the wrong legalese! According to David, a fix is in the works.

  2. David Madigan Says:

    The OMOP cup leaderboard is at:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s