Sometimes I Wish NLP Systems Occasionally Blew Up

by

The consequences of a badly performing NLP (Natural Language Processing) system tend to be a pretty low key, low drama event. Bad things may happen but not in a way that will get noticed in the same way as: Rocket Go Boom

The failure of a rocket launch highly motivates the involved engineers to not have that happen again. Miss that named entity, blam! If only NLP programmers had such explicit negative feedback. I think the field would be better for it.

NLP Systems are Easier to Sell than Build

Customers get the potential value of advanced NLP/Text Analytics/etc in the same way that people get the potential value of space flight.

Space ships in artist's conception

The Dreaded Artist's Conception--picture credit NASA

It would be so cool to do sentiment analysis in low earth orbit! Sadly, the tremendous promise of the field is held back by a combination of overselling, under-delivering and lack of awareness of how to build good performing systems. What contributes to poor performance the most?

Be aware that you are selling to the best NLP systems out there: Humans

One of the greatest frustrations I face is severely underfunded projects. For the most part rockets get a much healthier dose of funding because people see the failures clearly and do not have a grasp of how rockets work. Not so much for NLP. Language processing is so easy for humans that it is like trying to sell cargo airplanes to eagles. They just don’t get what is hard, what is easy and the necessity of infrastructure. “Mr. Eagle, um, well we really need a runway to get the 20 tons of product into the air”. Mr. Eagle responds with “What are you talking about? I can take off, land and raise a family on a tree branch. Cargo planes are easy because flying is easy for me. So I will give you a fish to do the job.”

Don’t ask a banker from 1994 to understand your tweets

Another source of poor performance is the reliance of general purpose solutions that are not well suited or tuned to the domain. It is unrealistic to expect a named entity model to perform well on Twitter if its training data is the 1994 Wall Street Journal. Modules customized for the domain can make a huge difference in performance. But customization costs money, takes time and requires a different mind set.

Understand the problem clearly with hand-coded examples

The #1 bit of advice I give customers is to emulate what you expect the system to do by hand. Then show that to stake holders to make sure a problem is being solved that addresses something your business cares about. Also, Mr. Eagle will much better appreciate the need of another solution after ferrying 100 pounds of product 1 pound at a time. By doing this you will have reduced the risk of failure by half in my opinion.

NLP is hard because in addition to being technically difficult, it is made worse because it seems easy for humans to do. They then under-appreciate the challenges. If systems blew up spectacularly we might have a better appreciation of that.

Breck

6 Responses to “Sometimes I Wish NLP Systems Occasionally Blew Up”

  1. daviddlewis Says:

    Great post. One spectacular blow-up in finance was when UAL’s stock dropped by a billion or so in 2008 based on a 2002 newswire story being interpreted as new:

    http://www.reuters.com/article/2008/09/10/us-trading-ual-idUSN1039166420080910

  2. breckbaldwin Says:

    David,
    That is a great example of a blow up. Any more out there folks?

    Breck

  3. Bob Carpenter Says:

    I like the “inappropriate” spell correction (or continuation) suggestions. You don’t want your auto-completer to suggest “fuck” as the continuation to “fu” in most circumstances, even if the language and statistics suggest you should.

    Apparently, Amazon had to fix the behavior of suggesting ‘adoption’ as a spell correction for ‘abortion’. What looks like edit-distance-2 with higher frequency becomes a social statement. (Remember, it’s higher frequency in use in context, not use in the language as a whole that matters here.)

    And everyone loves the Google bomb, but that’s more malicious than simple errors, as in having search for [miserable failure] returning the White House. But in Google’s NLP world, you have to deal with the people gaming the system — it’s part of the data.

  4. Joseph Turian Says:

    Another NLP blow-up, on a smaller scale:

    http://en.wikipedia.org/wiki/Nigger#.22Nigger-brown.22_colored_furniture

  5. Avi Rappoport, SearchTools.com Says:

    Great article, I think CIOs should read it and look at NLP vendors with a critical eye.

  6. Paul Says:

    Not quite a billion dollar stock dip, but a couple years back Netbase got interesting press for releasing a health-themed semantic search engine trained on, among other data, Wikipedia. By failing to disambiguate AIDS and aids the verb, they ended up listing “jews” as a cause of AIDS. And recommended alcohol and coarse salt to get rid of them.

    http://languagelog.ldc.upenn.edu/nll/?p=1715

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 824 other followers