<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Chain Conditional Random Fields: Implementation and Design Issues</title>
	<atom:link href="http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4948</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Thu, 25 Jun 2009 17:38:55 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4948</guid>
		<description><![CDATA[@Bernhard: Yes, and no.  At a high level, it&#039;s essentially a logistic regression problem at the p(tags&#124;tokens) level.  In chain CRFs, the input features are factored to only depend on pairs of adjacent tags and all the tokens.  So the features stand in a one-to-one relation with the dimensionality of the regression coefficients.  

What makes CRFs computationally challenging is making sure p(tags&#124;words) sums to 1.0 over the exponentially sized set of possible tag sequences. Comptutationally, you usually start with something proportional to p(tags&#124;words) and then normalize by summing.  The challenge is that there are exponentially many sequences of tags.  The good news is that you can use a version of forward/backward (a linear dynamic programming algorithm) to solve the summation.]]></description>
		<content:encoded><![CDATA[<p>@Bernhard: Yes, and no.  At a high level, it&#8217;s essentially a logistic regression problem at the p(tags|tokens) level.  In chain CRFs, the input features are factored to only depend on pairs of adjacent tags and all the tokens.  So the features stand in a one-to-one relation with the dimensionality of the regression coefficients.  </p>
<p>What makes CRFs computationally challenging is making sure p(tags|words) sums to 1.0 over the exponentially sized set of possible tag sequences. Comptutationally, you usually start with something proportional to p(tags|words) and then normalize by summing.  The challenge is that there are exponentially many sequences of tags.  The good news is that you can use a version of forward/backward (a linear dynamic programming algorithm) to solve the summation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bernhard</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4947</link>
		<dc:creator><![CDATA[Bernhard]]></dc:creator>
		<pubDate>Thu, 25 Jun 2009 13:15:23 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4947</guid>
		<description><![CDATA[Just trying to get a handle on the specialized wording which is identical in almost all tutorials.

What I understand from your explanation is that CRF generates features in a fashion similar to logistic regression, that is binomial probabilities from an arbitrary number of regression coefficients (betas). If it&#039;s so, then thanks for that. This helps.

Would you mind elaborating a bit on what you mean by &#039;normalization?&#039;]]></description>
		<content:encoded><![CDATA[<p>Just trying to get a handle on the specialized wording which is identical in almost all tutorials.</p>
<p>What I understand from your explanation is that CRF generates features in a fashion similar to logistic regression, that is binomial probabilities from an arbitrary number of regression coefficients (betas). If it&#8217;s so, then thanks for that. This helps.</p>
<p>Would you mind elaborating a bit on what you mean by &#8216;normalization?&#8217;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4943</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Wed, 24 Jun 2009 16:24:45 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4943</guid>
		<description><![CDATA[HMMs are generative, modeling p(words,tags) and using Bayes&#039; rule to estimate p(tags&#124;words), whereas CRFs are conditional, modeling p(tags&#124;words) directly.

HMMs model label transitions as a multinomial p(tag&#124;previousTag) per previous tag.  In NLP, emissions are also typically multinomial (like naive Bayes) for p(word&#124;tag).  


CRFs are more like logistic regression in that they extract features from the input words using arbitrary amounts of context, outside knowledge, etc.  The words are explanatory variables and are not themselves modeled.  The chain part means you can also extract features from pairs of labels plus the context.  The key differentiator of CRFs is that they normalize over the whole input, not per tag.  

Really, though, you should read one of the tutorials.]]></description>
		<content:encoded><![CDATA[<p>HMMs are generative, modeling p(words,tags) and using Bayes&#8217; rule to estimate p(tags|words), whereas CRFs are conditional, modeling p(tags|words) directly.</p>
<p>HMMs model label transitions as a multinomial p(tag|previousTag) per previous tag.  In NLP, emissions are also typically multinomial (like naive Bayes) for p(word|tag).  </p>
<p>CRFs are more like logistic regression in that they extract features from the input words using arbitrary amounts of context, outside knowledge, etc.  The words are explanatory variables and are not themselves modeled.  The chain part means you can also extract features from pairs of labels plus the context.  The key differentiator of CRFs is that they normalize over the whole input, not per tag.  </p>
<p>Really, though, you should read one of the tutorials.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bernhard</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4941</link>
		<dc:creator><![CDATA[Bernhard]]></dc:creator>
		<pubDate>Wed, 24 Jun 2009 14:54:42 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4941</guid>
		<description><![CDATA[Its hard to understand what a &#039;feature&#039; is in CRFs. In HMMs its essentially a word/token. What&#039;s it in CRFs? Can you help out with an intuitive explanation?]]></description>
		<content:encoded><![CDATA[<p>Its hard to understand what a &#8216;feature&#8217; is in CRFs. In HMMs its essentially a word/token. What&#8217;s it in CRFs? Can you help out with an intuitive explanation?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lingpipe</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4928</link>
		<dc:creator><![CDATA[lingpipe]]></dc:creator>
		<pubDate>Tue, 23 Jun 2009 03:36:58 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4928</guid>
		<description><![CDATA[I&#039;ve glanced through the source for Mallet, JavaNLP, Carafe, and read most of the papers.  I cite the Sutton and McCallum tutorial above -- it goes into the most implementation detail.

I&#039;m not the world&#039;s best reader of other people&#039;s code (that&#039;d be Ezra), but CRF algorithms are particularly difficult because they embed I/O, tokenization, serialization, feature extraction, forward-backward, and some kind of regularized logistic regression optimizer.  The research packages tend to provide lots of options because they&#039;ve been doing research on these things for years.  I get lost just at the &quot;what exactly are these input types&quot; stage with Mallet and JavaNLP. 

I&#039;m also not sure the people who implemented this stuff wouldn&#039;t do it differently given a second chance.]]></description>
		<content:encoded><![CDATA[<p>I&#8217;ve glanced through the source for Mallet, JavaNLP, Carafe, and read most of the papers.  I cite the Sutton and McCallum tutorial above &#8212; it goes into the most implementation detail.</p>
<p>I&#8217;m not the world&#8217;s best reader of other people&#8217;s code (that&#8217;d be Ezra), but CRF algorithms are particularly difficult because they embed I/O, tokenization, serialization, feature extraction, forward-backward, and some kind of regularized logistic regression optimizer.  The research packages tend to provide lots of options because they&#8217;ve been doing research on these things for years.  I get lost just at the &#8220;what exactly are these input types&#8221; stage with Mallet and JavaNLP. </p>
<p>I&#8217;m also not sure the people who implemented this stuff wouldn&#8217;t do it differently given a second chance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff</title>
		<link>http://lingpipe-blog.com/2009/06/18/chainconditional-random-fields-implementation-and-design/#comment-4926</link>
		<dc:creator><![CDATA[Jeff]]></dc:creator>
		<pubDate>Tue, 23 Jun 2009 01:22:44 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=1611#comment-4926</guid>
		<description><![CDATA[Have you looked at the &lt;a href=&quot;http://mallet.cs.umass.edu/&quot; rel=&quot;nofollow&quot;&gt;Mallet&lt;/a&gt; implementation?  It&#039;s one of the standard implementations from Andrew McCallum&#039;s lab.]]></description>
		<content:encoded><![CDATA[<p>Have you looked at the <a href="http://mallet.cs.umass.edu/" rel="nofollow">Mallet</a> implementation?  It&#8217;s one of the standard implementations from Andrew McCallum&#8217;s lab.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

