Aditya Kalyanpur presented an overview of the Jeopardy! winning Watson computer system June 6 at TheLadders.com in New York for the New York Semantic Web Meetup. I was asked to present a three minute overview of the state of Natural Language Processing (NLP). In this post I want to couch the Watson system in the context of the state-of-the-art since it didn’t make sense to do it at the meetup because I presented first.
The State of NLP According to Breck
Conveying the state-of-the-art in three minutes is quite a challenge so lets run with the analogy of aviation for ease of comprehension. So where is NLP?
We have achieved the analog of basic powered flight. No doubt.
But in no sense have we gotten to the this level of performance.
My best guess is that we are at the point of a reasonable commercial foundation as an industry with some changes to come that we don’t know about yet, not unlike aviation in the mid 1920’s. Perhaps the beginning of the Golden Age of NLP.
And in no sense are we in the reliable, high technology commercial space that modern air transport provides.
Where does Watson Fit in the Analogy
Watson fits perfectly in the example of the red 1928 Lockheed Vega above for the following reasons:
- The Vega is actually Amelia Earhart’s plane that was used to break records (crossing the Atlantic solo), generate publicity and was a stunning success for a nascent industry.
- While inspirational, the Vega’s success had little to do with advancing the underlying technology. What would I consider an advancement of technology? Frank Whittle patented the turbojet in 1930.
- Watson shows how a 20 person team working 4 years can win a very challenging game with skill, effort and daring much in the same way that aviation records were broken with the same. Don’t think some careers were not on the line with the Watson effort–I think IBM ceased termination by firing squad in the 70’s so Earhart had more on the line. But what are the prospects of a mid-level ex-IBM exec in today’s economy? Perhaps the firing squad would be a kindness.
But Watson is Playing a Game
There is one issue that seriously concerns me; Watson won a question answering game with the trivial twist that the answers must be phrased as questions. So the clue “First President of the US” is answered with “Who is George Washington”. But Watson is not a general purpose question answering system. What is the difference?
Another analogy: The game of chess is based on medieval battles but even though Big Blue beats the best human players one would never consider using Big Blue to manage an actual battle. Real war is messy, approximate and without clear rules which makes chess algorithms totally inappropriate.
Real world question answering has similar qualities to real war: messy, approximate and no clear rules. The game of Jeopardy! is based on the existence of a unique, easily understood and verified answer given the clue. Taking one of the examples from the talk;
In 1698, this comet discoverer took a ship called the Paramour Pink on the first purely scientific sea voyage
The correct “question” is “Who is Edmond Halley” of Halley’s Comet fame. The example is used to work through an impressive system diagram that resembles a well developed model train set (thanks to Prof. Mark Steedman for the simile). Much is done to generate the correct answer while avoiding distractors like Peter Sellers from the Pink Panther movies. But run the same clue past Google with “-watson -jeopardy” appended to eliminate pages that mention discussion of this publicized example and the first result is Halley’s Comet Stamps with the first sentence mentioning the correct answer.
There is still an impressive amount of work in extracting the correct name but the hunt for the answer was ready to be found exactly because it is a game, unambiguous, well known and well selected given the clue.
What does Real World Question Answering Look Like?
What kinds of questions have I approached a search engine with?
What is the current 30 year FHA mortgage rate?
This question is a disaster from the uniqueness of answer perspective. My initial search results were pretty low quality and did not provide accurate rate information for what I knew the answer to be.
When is it best to ski in Chile?
This went better. There was a FAQ on the first page of results but the answer just went on and on. “The season runs from mid-June to mid-October. Although every year is different, and it comes down to Mother Nature, the best time for dry powder is mid-June, July, August, and up to the 2nd week in September. After that,….” Again we have a non-unique answer because my question was not that specific in the first place.
What is the Reputation of LingPipe?
This is a question that a group of Columbia MBA students took on for us in their small business program which I recommend btw.
This question was hopeless in search because there is not a page out there that needs to be found with our reputation nicely summarized. Answering the question requires distillation across many resources even if information was restricted to web only.
Welcome to the real world, question answering is hell.
Where Might Watson Flourish Outside of Jeopardy! Tournaments?
Jeopardy! is a game of finding the uniquely obvious given indirect clues. Otherwise it is not a game that can be judged and played. What else in the world has this quality? The Watson team is now approaching medical diagnosis which is a real world use case that might match the Jeopardy! game format with symptoms as clues and diagnosis as the answer. Uniqueness is not guaranteed in diagnosis but Watson can handle multiple answers. This is an area where computer systems from the 1970’s, e.g. Mycin, out performed experts but they didn’t have a NLP component. Medical diagnosis, once symptoms are recognized, is a game like problem.
In the end Watson is an engineering achievement, but in no way have the skills of a good reference librarian been replicated.
I came across an interesting article by Michael Lind on information technology and its role in productivity while writing this blog post. Interestingly he puts information technology in the same time bracket as I do.