<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Spinning Straw into Gold: How to do gold standard data right</title>
	<atom:link href="http://lingpipe-blog.com/2008/03/08/spinning-straw-into-gold-how-to-do-gold-standard-data-right/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2008/03/08/spinning-straw-into-gold-how-to-do-gold-standard-data-right/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Breck</title>
		<link>http://lingpipe-blog.com/2008/03/08/spinning-straw-into-gold-how-to-do-gold-standard-data-right/#comment-2002</link>
		<dc:creator><![CDATA[Breck]]></dc:creator>
		<pubDate>Wed, 12 Mar 2008 14:27:23 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=79#comment-2002</guid>
		<description><![CDATA[Given that we suspect that my and Bob&#039;s recall errors are correlated--meaning we tend to miss the same things--then the question becomes how many annotators do we need to get 100% recall? I made the estmate that I was missing 5% of the recall upon adjudication as was Bob. We need to get a third annotator to decide if the union of me and Bob misses .025% of recall against the third annotation. If so then that supports independence of my and Bob&#039;s recall oversights. If the union annotation misses more than that, then we have evidence that my and Bob&#039;s errors are correlated. All of this has to be wrapped in a meaningful statistical analysis over many iterations but that is the general idea.

It would be a fairly easy experiment to run to see for 20 abstracts when do you start not seeing increased recall with new annotators. I am guessing you would start to see little new found around 4-5 annotaotors but I don&#039;t know. 

breck]]></description>
		<content:encoded><![CDATA[<p>Given that we suspect that my and Bob&#8217;s recall errors are correlated&#8211;meaning we tend to miss the same things&#8211;then the question becomes how many annotators do we need to get 100% recall? I made the estmate that I was missing 5% of the recall upon adjudication as was Bob. We need to get a third annotator to decide if the union of me and Bob misses .025% of recall against the third annotation. If so then that supports independence of my and Bob&#8217;s recall oversights. If the union annotation misses more than that, then we have evidence that my and Bob&#8217;s errors are correlated. All of this has to be wrapped in a meaningful statistical analysis over many iterations but that is the general idea.</p>
<p>It would be a fairly easy experiment to run to see for 20 abstracts when do you start not seeing increased recall with new annotators. I am guessing you would start to see little new found around 4-5 annotaotors but I don&#8217;t know. </p>
<p>breck</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://lingpipe-blog.com/2008/03/08/spinning-straw-into-gold-how-to-do-gold-standard-data-right/#comment-1999</link>
		<dc:creator><![CDATA[Chris]]></dc:creator>
		<pubDate>Tue, 11 Mar 2008 14:26:15 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=79#comment-1999</guid>
		<description><![CDATA[Does the need for 90% recall have to do with only using two raters?  Imagine using 100 raters, or 1000.  How low could the agreement requirements go given larger numbers of annotators?  I&#039;m asking because I&#039;ve just blogged about the feasibility of using a web-based app to get raters from all over the world to complete large-scale annotation projects.  It may be a pipe dream, even if not a LingPipe dream ... rimshot!]]></description>
		<content:encoded><![CDATA[<p>Does the need for 90% recall have to do with only using two raters?  Imagine using 100 raters, or 1000.  How low could the agreement requirements go given larger numbers of annotators?  I&#8217;m asking because I&#8217;ve just blogged about the feasibility of using a web-based app to get raters from all over the world to complete large-scale annotation projects.  It may be a pipe dream, even if not a LingPipe dream &#8230; rimshot!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2008/03/08/spinning-straw-into-gold-how-to-do-gold-standard-data-right/#comment-1996</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Mon, 10 Mar 2008 23:12:51 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=79#comment-1996</guid>
		<description><![CDATA[The real question I have is:  is the truth out there?  

Reporting 90% recall for an annotator assumes the 90% is 90% of something.  Philosophically old-fashioned (or naive) practitioners might call this  &quot;the truth&quot;, but Breck and I studied Quine and have adopted &lt;a href=&quot;http://en.wikipedia.org/wiki/Pragmatism&quot; rel=&quot;nofollow&quot;&gt;pragmatism&lt;/a&gt; as the official semantic theory of Alias-i.  

What we&#039;re really proposing is the adoption of a pragmatic evaluation of recall that bypasses the need for &quot;truth&quot;.

On a technical note, what I (Bob) would actually like to do is get multiple annotators and try to measure just how correlated the errors are.  I&#039;m guessing they&#039;ll be highly correlated, and thus we&#039;ll need more than two 90% recall annotations to get to 99% recall even measured pragmatically.]]></description>
		<content:encoded><![CDATA[<p>The real question I have is:  is the truth out there?  </p>
<p>Reporting 90% recall for an annotator assumes the 90% is 90% of something.  Philosophically old-fashioned (or naive) practitioners might call this  &#8220;the truth&#8221;, but Breck and I studied Quine and have adopted <a href="http://en.wikipedia.org/wiki/Pragmatism" rel="nofollow">pragmatism</a> as the official semantic theory of Alias-i.  </p>
<p>What we&#8217;re really proposing is the adoption of a pragmatic evaluation of recall that bypasses the need for &#8220;truth&#8221;.</p>
<p>On a technical note, what I (Bob) would actually like to do is get multiple annotators and try to measure just how correlated the errors are.  I&#8217;m guessing they&#8217;ll be highly correlated, and thus we&#8217;ll need more than two 90% recall annotations to get to 99% recall even measured pragmatically.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

