LingPipe Incubator Welcomes Seer

by

We have started an incubator program for start-ups with natural language processing (NLP) needs. Seer is our first egg. The company creates productivity apps based on unstructured data sources–that is where LingPIpe fits in. The guys (Conall and Joe) fit a pattern that we see quite often–smart people with a good idea that presupposes NLP but they don’t have much experience with it.

The GetSeer guys were on the ball right away because they had a bunch of data annotated and had cobbled together a regular expression based classifier that served as an excellent starting point. We had machine learning based classifiers up and running within two hours of starting. That included an evaluation harness. Nice.

Image

The great thing about our working setup is that I get to teach/advise which keeps my time commitment manageable and they get to do most of the heavy lifting which is how they learn to build and maintain NLP systems. I have had to learn some Scala since I co-code most of the time when we work together and that is their implementation language. Scala is a hip extension of Java with a less verbose syntax and stronger type inference.

I’ll keep the blog updated with developments. Current status:

  • There is a small amount of training data.
  • Current results are awful (no surprise).
  • We have been roughing in the solution with Naive Bayes. Next steps will be opening a can of logistic regression for more interesting feature extraction.

Tags:

2 Responses to “LingPipe Incubator Welcomes Seer”

  1. Bob Carpenter Says:

    Sounds like a good use case for the “traditional” version of naive Bayes I wrote that supports semi-supervised learning via the EM algorithm. Here’s the tutorial:

    http://alias-i.com/lingpipe/demos/tutorial/em/read-me.html

    Probably the wrong title for it, because it’s not about EM per se as much as semi-supervised learning.

    I would also very much support the idea of building a learn-a-little, tag-a-little interface for labeing more data. Should be much easier to build than the named-entity version that we already have, and perhaps more widely applicable.

    • breckbaldwin Says:

      We are about to pitch off into an active learning phase. Krishna and I have been talking about setting up such an interface as a web service. It is a very common pattern for building up these classifiers. Hmmm….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 824 other followers