The consequences of a badly performing NLP (Natural Language Processing) system tend to be a pretty low key, low drama event. Bad things may happen but not in a way that will get noticed in the same way as:
The failure of a rocket launch highly motivates the involved engineers to not have that happen again. Miss that named entity, blam! If only NLP programmers had such explicit negative feedback. I think the field would be better for it.
NLP Systems are Easier to Sell than Build
Customers get the potential value of advanced NLP/Text Analytics/etc in the same way that people get the potential value of space flight.
It would be so cool to do sentiment analysis in low earth orbit! Sadly, the tremendous promise of the field is held back by a combination of overselling, under-delivering and lack of awareness of how to build good performing systems. What contributes to poor performance the most?
Be aware that you are selling to the best NLP systems out there: Humans
One of the greatest frustrations I face is severely underfunded projects. For the most part rockets get a much healthier dose of funding because people see the failures clearly and do not have a grasp of how rockets work. Not so much for NLP. Language processing is so easy for humans that it is like trying to sell cargo airplanes to eagles. They just don’t get what is hard, what is easy and the necessity of infrastructure. “Mr. Eagle, um, well we really need a runway to get the 20 tons of product into the air”. Mr. Eagle responds with “What are you talking about? I can take off, land and raise a family on a tree branch. Cargo planes are easy because flying is easy for me. So I will give you a fish to do the job.”
Don’t ask a banker from 1994 to understand your tweets
Another source of poor performance is the reliance of general purpose solutions that are not well suited or tuned to the domain. It is unrealistic to expect a named entity model to perform well on Twitter if its training data is the 1994 Wall Street Journal. Modules customized for the domain can make a huge difference in performance. But customization costs money, takes time and requires a different mind set.
Understand the problem clearly with hand-coded examples
The #1 bit of advice I give customers is to emulate what you expect the system to do by hand. Then show that to stake holders to make sure a problem is being solved that addresses something your business cares about. Also, Mr. Eagle will much better appreciate the need of another solution after ferrying 100 pounds of product 1 pound at a time. By doing this you will have reduced the risk of failure by half in my opinion.
NLP is hard because in addition to being technically difficult, it is made worse because it seems easy for humans to do. They then under-appreciate the challenges. If systems blew up spectacularly we might have a better appreciation of that.