A skill developers typically don’t get in school is how to frame problems in terms of the messy, approximate world of heuristic and machine learning driven natural language processing. This blog entry should help shed some light on what remains a mostly self taught black art. This is not the only way to do things, just my preferred way.
At the top level I seek three things:
- Human annotated data that directly encodes the intended output of your NLP program.
- A brain dead, completely simple instance of a program that connects all inputs to the intended output.
- An evaluation setup that takes 1) and 2) and produces a score for how good a job the system did. That score should map to a management approved objective.
Once I have the above I can then turn my attention to improving a score without worrying about whether I am solving the right problem (1 and 2 handle this) and whether I have sorted out access to the raw data and have a rough architecture that makes sense. Some more details on each point:
Human Annotated Data
If a human cannot carry out the task you expect the computer to do (given that we are doing NLP), then the project is extremely likely to fail. Humans are the best NLP systems in the world. Humans are just amazing at it and humans fail to appreciate the sophistication of what they do with zero effort. I almost always ask customers to provide annotated data before accepting work. What does this provide?
- Disambiguation: Annotated data forces a decision on what the NLP system is supposed to do and it communicates it clearly to all involved parties. It also keeps the project from morphing away from what is being developed without an explicit negotiation over the annotation.
- Buy in by relevant parties: It is amazing what happens when you sit management, UI developers, business development folks in a room and force them to take a text document and annotate it together. Disagreements that would be at the end of a project surface immediately, people know what they are buying and they get a sense that it might be hard. The majority of the hand waving “Oh, just have the NLP do the right thing ” goes away. Bonus points if you have multiple people annotate the same document independently and compare them. If the agreement rate is low then how can you expect a piece of software to do it?
- Evaluation: The annotated data is a starting place for evaluation to take place. You have gold standard data to compare to. Without it you are flying blind.
Simple implementation that connects the bits
I am what Bob calls a "thin wire developer" because I really prefer to reduce project risk by being sure all the bits of software/information can talk to each other. I have been amazed at how difficult access to data/logs/programs can be in enterprise setups. Some judgement is required here, I want to hit where there are likely blocks that may force completely different approaches (e.g. access search engine logs for dynamic updates or lists of names that should be tracked in data). Once again this forces decisions early in development rather than later. Unfortunately it takes experience to know what bits are likely to be difficult to get and valuable in the end system.
An evaluation setup will truly save the day. It is very frustrating to build a system where the evaluation consists of "eyeballing data by hand" (I actually said this at my PhD defense to the teasing delight of Michael Niv, a fellow graduate student, who to this day ponders my ocularly enhanced appendages). Some of the benefits are:
- Developers like a goal and like to see performance improve. It gets addictive and can be quite fun. You will get a better system as a result.
- If the evaluation numbers map well to the business objective then the NLP efforts are well aligned with what the business wants. (Business objectives can be to win an academic bake-off for graduate students).
- Looks great to management. Tuning systems for better performance can be a long and opaque process to management. I got some good advice to always link the quality of the GUI (Graphical User Interface) to the quality of the underlying software to communicate transparently the state of the project. An evaluation score that is better than last month communicates the same thing especially if management helped design the evaluation metric.
I will likely continue this blog thread picking up in greater detail the above points. Perhaps some use cases would be informative.