How Home Dentistry Kits and LingPipe are Similar


LingPipe is a tough sell to most commercial customers without professional services. Occasionally I will do a deal where all I do is cash a check but almost all of our customers want lots of help. Suggest that they build it themselves and they look at me like I suggested a home root canal. Why?

Take one of our simplest capabilities, language model classification. There is a simple, runs out of the box tutorial that takes
a developer through line by line what needs to be done to do some classification. It is really simple. Yet I cannot get certain customers working with it.

The sticking point, I believe, is that unfamiliarity plus the slightly loose nature of machine learning techniques is too great a jump conceptually. The DynamicLMClassifier needs the labels of the categories (easy), boolean choice of whether to use a bounded or sequence based language model (starting to feel a little woozy) and a character n-gram size (whoa, a ‘whackety n-germ thingy’). The tutorial suggests that 6 is a good initial n-gram value but they are lost at this point I think. It gets worse because I suggest in the tutorial that they try different n-gram sizes to see what produces the best score. The scoring is nicely provided as part of the tutorial as well. This only gets worse as we dig deeper into the LingPipe API.

Tuning these systems requires a particular mindset that is not a part of a core computer science curriculum. It doesn’t require great intelligence, but experience is a must. Until we find a way to sort this out we will continue to see such systems out of general production. My mantra is “make computational linguistics as easy to use as a database.” We have a ways to go before we move away from the black art status of our field.


