Release: LingPipe GUI Named Entity and Chunk Annotator


We’re soft-releasing our new GUI tool for annotating corpora with named entities or other chunks. It’s set up a little differently than other such tools with which you might be familiar, such as Callisto or WordFreak. It’s basically token-oriented with combo-box controls, making it look a lot like the CoNLL named entity data in the GUI view. The goal was to make it easy to drive from the keyboard rather than working as a text editor with highlight and menu select.

It does tag-a-little, learn-a-little style semi-automatic annotation (a la MITRE’s seminal Alembic Workbench) and tracks progress through a corpus annotation project. It assumes inputs are in a specified XML format.

For now, the annotator’s in the sandbox, under project name citationEntities.

The name arose because the project started as something to annotate a bunch of bibliographic data.

Even though it’s in the sandbox, I’ve used it to annotate a ton of data already and I’ve ironed out all the bugs I’ve found.

You can find our general instructions for checking projects out of the sandbox at:

You can also use the following command to download
the entire project into the current working directory:

cvs -d
checkout citationEntities

Just make sure to put it all on one line; it’s on two here because of the blog formatting.

The place to start is the top-level read-me file:


Let us know if you have any comments at