Call for PhD Ideas for Lucene/Solr Implementation

by

Otis over at the Sematech blog wants to get people identifying hard problems faced by people working in search that PhD students can select from and hopefully solve with a Lucene/Solr implementation. The students get a ‘real world problem’ and the world gets a concrete open source implementation of the solution. The call is: Lucene / Solr for Academia: PhD Thesis Ideas

My suggested PhD idea is tolerable precision at high recall dictionary matching of phrases. Mike Ross spent a good deal of time trying to get 100% matches of genes in MEDLINE abstracts given a dictionary of genes (Entrez Gene) and aliases. The core of the problem is that not all the mentions of genes are on the aliases set for the gene. Huge issues around efficiency in addition to getting it working at all.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s