<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Deleting Values in Welford&#8217;s Algorithm for Online Mean and Variance</title>
	<atom:link href="http://lingpipe-blog.com/2009/07/07/welford-s-algorithm-delete-online-mean-variance-deviation/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/07/07/welford-s-algorithm-delete-online-mean-variance-deviation/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Fri, 07 Jun 2013 17:17:33 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/07/07/welford-s-algorithm-delete-online-mean-variance-deviation/#comment-7633</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Mon, 09 Aug 2010 17:32:16 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1771#comment-7633</guid>
		<description><![CDATA[This is not the place to ask a general question!  We have a mailing list and e-mail (see the LingPipe home page).

The usual thing to do is to treat the word to count maps as vectors with the words as the dimensions and then use standard vector cosine to compare them.  This is all implemented in LingPipe, though has nothing to do with this post.  Often, there&#039;s a TF/IDF rescaling of the counts.  Check out LingPipe&#039;s class &lt;a href=&quot;http://alias-i.com/lingpipe/docs/api/com/aliasi/spell/TfIdfDistance.html&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;spell.TfIdfDistance&lt;/code&gt;&lt;/a&gt; for details and an implementation.]]></description>
		<content:encoded><![CDATA[<p>This is not the place to ask a general question!  We have a mailing list and e-mail (see the LingPipe home page).</p>
<p>The usual thing to do is to treat the word to count maps as vectors with the words as the dimensions and then use standard vector cosine to compare them.  This is all implemented in LingPipe, though has nothing to do with this post.  Often, there&#8217;s a TF/IDF rescaling of the counts.  Check out LingPipe&#8217;s class <a href="http://alias-i.com/lingpipe/docs/api/com/aliasi/spell/TfIdfDistance.html" rel="nofollow"><code>spell.TfIdfDistance</code></a> for details and an implementation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph</title>
		<link>http://lingpipe-blog.com/2009/07/07/welford-s-algorithm-delete-online-mean-variance-deviation/#comment-7629</link>
		<dc:creator><![CDATA[Joseph]]></dc:creator>
		<pubDate>Mon, 09 Aug 2010 10:30:06 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1771#comment-7629</guid>
		<description><![CDATA[I&#039;m a student at University of Manitiba and had a question I thought you might be able to answer.  I&#039;m trying to identify and algorith that would be best suited to help me identify relationship between two arrays (each array generated froma separate archival story/article).  Each array contains 1 column of keyword entries paired to a second column of frequencies that the keyword occurs within the original text.

The idea is that I should be able to compare these two arrays and identify if there are enough keywords in common to deem these two articles related.

Any suggestions on stats models or formulas to use?  Point me in the right direction?

Thanks!]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m a student at University of Manitiba and had a question I thought you might be able to answer.  I&#8217;m trying to identify and algorith that would be best suited to help me identify relationship between two arrays (each array generated froma separate archival story/article).  Each array contains 1 column of keyword entries paired to a second column of frequencies that the keyword occurs within the original text.</p>
<p>The idea is that I should be able to compare these two arrays and identify if there are enough keywords in common to deem these two articles related.</p>
<p>Any suggestions on stats models or formulas to use?  Point me in the right direction?</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
