A Nobel in Computational Linguistics?


How is that? Amongst the research geeks we see a BIG opportunity in
squeezing more information out of written human knowledge. Generally
that means the research literature but it can extend to databases and
other “encodings” of information about the world. The squeezing
involves the transition from word-based search to concept-based

It’s a big deal and one that you personally care about if you think
you will be needing the serious attention of a doctor 15 years from
now. Making the leap will uncover a new world of therapies, treatments
and scientific understanding–at least that is the idea and it is well
worth exploring. It is a cure-for-cancer level achievement. As they
say at the Indy 500: “Researchers, start your graduate students”.

A few details, but I am going to keep this sketchy. Mr. Search Engine
does a pretty good job finding words in documents, but words are a
long way from finding every document in MEDLINE that mentions the
gene id 12, official name Serpina3. Why?

Not enough found: The concept for Serpina3 is expressed in
documents as ‘ACT’, ‘GIG24’, ‘AACT’ amongst others and
Mr. Search Engine misses these entirely. Attempts to help
Mr. Search Engine have pretty much failed up to now.

Too much found: The alias ‘ACT’ is highly ambiguous amongst
genes as well as the word ‘act’ in more common use. It is like
finding John Smith on the web–Mr. Search Engine doesn’t even
get the fact that there are lots of different things in the
world mentioned the same way.

What is the payoff? Once you get concept indexing sorted out, then you
can start playing games very effectively with some old ideas floated
by Don Swanson in ’88 originally about migraines and dietary magnesium*. The
approach there tries to find disease A with underlying causes B, and
then find treatments C which apply to B but are not known to apply to
disease A yet.

Nice idea–the problem is that it is pretty seriously limited if the
A, B and C’s are limited to keyword lookup. Make those concept lookups
and Dr. Swanson’s approach will gain some serious traction. Once that
happens I see Dr. Swanson and the folks who solve the concept indexing
problem enjoying some quality time in Stockholm. I hope they invite me
along for the celebration dinner.


*-There are tons of other interesting ideas that would gain traction with
concept search as well. Swanson however is the first person I know of who
actually did something with it. Cite:

SWANSON, D. R. (1988), Migraine and magnesium: eleven neglected
connections, Perspectives in Biology and Medicine, 31 : 526–557.

One Response to “A Nobel in Computational Linguistics?”

  1. Data Mining Says:

    Word Sense Disambiguation in Text Mining

    Breck posts a concise piece on the value of moving away from strings/words to concepts in a search and text mining environment. One of the basic problems here – word sense ambiguity – is also nicely illustrated by the current

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: