We’re soft-releasing our new GUI tool for annotating corpora with named entities or other chunks. It’s set up a little differently than other such tools with which you might be familiar, such as Callisto or WordFreak. It’s basically token-oriented with combo-box controls, making it look a lot like the CoNLL named entity data in the GUI view. The goal was to make it easy to drive from the keyboard rather than working as a text editor with highlight and menu select.
It does tag-a-little, learn-a-little style semi-automatic annotation (a la MITRE’s seminal Alembic Workbench) and tracks progress through a corpus annotation project. It assumes inputs are in a specified XML format.
For now, the annotator’s in the sandbox, under project name citationEntities.
The name arose because the project started as something to annotate a bunch of bibliographic data.
Even though it’s in the sandbox, I’ve used it to annotate a ton of data already and I’ve ironed out all the bugs I’ve found.
You can find our general instructions for checking projects out of the sandbox at:
You can also use the following command to download
the entire project into the current working directory:
cvs -d :pserver:firstname.lastname@example.org:/usr/local/sandbox checkout citationEntities
Just make sure to put it all on one line; it’s on two here because of the blog formatting.
The place to start is the top-level read-me file:
Let us know if you have any comments at email@example.com