- Work Product: POS Taggers
with this quote:
I surely just need to test a bunch of them [part of speech taggers] is some semi-systematic way, but is there any existing consensus about what works best for literary material?
I would highly recommend reading from the first post in the thread forward. It’s a great fly-on-the-wall view of a non-specialist coming to grips with natural language processing.
Over the past three months, Matthew’s evaluated a whole bunch of different part-of-speech taggers looking for something that’ll satisfy his accuracy, speed, and licensing needs. The data is literary English, for now, mostly culled from Project Gutenberg.
The current entry, Evaluating POS Taggers: Conclusions, dated 27 January 2009, starts with:
OK, I’m as done as I care to be with the evaluation stage of this tagging business, which has taken the better part of three months of intermittent work. This for a project that I thought would take a week or two. There’s a lesson here, surely, …
Amen, brother. That’s one reason why Kevin Cohen’s organizing the software workshops at ACL (this year Marc Light co-chairs, but I’m still on the PC). So I suggested to Matthew that he submit this diary of his work to the NAACL 2009 Software Workshop, which is explicitly calling for just such case studies.