SIGHan ‘06: Chinese Word Segmentation and Named Entity Recognition


LingPipe 2.3 (not out yet) includes a rescoring n-best named-entity adapter and an implementation involving longer-distance character language models. We used it for the named-entity portion of the 3rd International Chinese Language Processing Bakeoff. We used the same exact implementation of word segmentation as can be found in our Chinese Word Segmentation Tutorial.

There were four word-segmentation corpora and two named-entity corpora (well, three, but we ignored the LDC’s as it was in its own format and required even more licenses for download). LingPipe fared pretty well, winding up near the median in most evaluations (which also put it near the top). Our best segmentation performance was .972 F-measure, and our best named-entity recognition (person/location/organization) performance was .855 F-measure. These were 1.1% and 3.5% off the best closed (no external resources) scores in the bakeoff, respectively.

Full details are available from our system write-up:

We hope to release LingPipe 2.3 in the next month or two; we’re pretty busy with commercial applications work right now.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: