We have started an incubator program for start-ups with natural language processing (NLP) needs. Seer is our first egg. The company creates productivity apps based on unstructured data sources–that is where LingPIpe fits in. The guys (Conall and Joe) fit a pattern that we see quite often–smart people with a good idea that presupposes NLP but they don’t have much experience with it.
The GetSeer guys were on the ball right away because they had a bunch of data annotated and had cobbled together a regular expression based classifier that served as an excellent starting point. We had machine learning based classifiers up and running within two hours of starting. That included an evaluation harness. Nice.
The great thing about our working setup is that I get to teach/advise which keeps my time commitment manageable and they get to do most of the heavy lifting which is how they learn to build and maintain NLP systems. I have had to learn some Scala since I co-code most of the time when we work together and that is their implementation language. Scala is a hip extension of Java with a less verbose syntax and stronger type inference.
I’ll keep the blog updated with developments. Current status:
- There is a small amount of training data.
- Current results are awful (no surprise).
- We have been roughing in the solution with Naive Bayes. Next steps will be opening a can of logistic regression for more interesting feature extraction.