<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Entropy of English vs. Chinese</title>
	<atom:link href="http://lingpipe-blog.com/2008/04/11/the-entropy-of-english-vs-chinese/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2008/04/11/the-entropy-of-english-vs-chinese/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Sat, 04 Feb 2012 20:56:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: tiflo</title>
		<link>http://lingpipe-blog.com/2008/04/11/the-entropy-of-english-vs-chinese/#comment-4304</link>
		<dc:creator><![CDATA[tiflo]]></dc:creator>
		<pubDate>Wed, 25 Mar 2009 06:22:46 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=93#comment-4304</guid>
		<description><![CDATA[Hi Bob, 

somebody pointed me to this page and I thought you might be interested in work one of the students here at Rochester, Ting Qian, has been doing on constant entropy rate (distribution of entropy throughout discourses) in Chinese. There is a paper under submission with written and spoken Chinese in it and a replication of Genzel and Charniak 2002 on English (using an approach that provides better control for potential confounds). The paper should be soon available at: http://www.tqian.org/pub.html, but I can forward you a copy of the paper, if you&#039;re interested. not quite what you were talking about, but very related.]]></description>
		<content:encoded><![CDATA[<p>Hi Bob, </p>
<p>somebody pointed me to this page and I thought you might be interested in work one of the students here at Rochester, Ting Qian, has been doing on constant entropy rate (distribution of entropy throughout discourses) in Chinese. There is a paper under submission with written and spoken Chinese in it and a replication of Genzel and Charniak 2002 on English (using an approach that provides better control for potential confounds). The paper should be soon available at: <a href="http://www.tqian.org/pub.html" rel="nofollow">http://www.tqian.org/pub.html</a>, but I can forward you a copy of the paper, if you&#8217;re interested. not quite what you were talking about, but very related.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://lingpipe-blog.com/2008/04/11/the-entropy-of-english-vs-chinese/#comment-2108</link>
		<dc:creator><![CDATA[Bob Carpenter]]></dc:creator>
		<pubDate>Mon, 14 Apr 2008 19:19:46 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=93#comment-2108</guid>
		<description><![CDATA[Dylan: Good point about the channel side of the problem (converting speech to characters); I was only thinking of the source side of the noisy channel (what sequences of characters are likely).  It really is the whole source/channel system that should be measured for efficiency.  

Transduction can go either way between sounds and characters, depending on whether you&#039;re listening (speech recognition) or producing (synthesis).  If you view the channel as carrying characters encoded as sounds, then you have p(characters) and p(sounds&#124;characters) as source and channel model.  If you view the channel as carrying sounds encoded as characters, you have p(sounds) and p(characters&#124;sounds) respectively. 

The issue of &quot;entropy of language X&quot; is about just the source, p(characters) or p(sounds) depending on whether you&#039;re modeling the language as sequences of characters or sequences of sounds.

The truly interesting case is to take the source as &quot;meaning&quot; and the channel as carrying either characters or sounds.  We don&#039;t quite have a handle on that yet, I&#039;m afraid.

It&#039;d be interesting if the various versions of Chinese (e.g. traditional vs. simplified vs. ancient -- my knowledge of Chinese is very limited) changed the encoding rates.  It&#039;d be the clearest example of the point Mark Liberman was trying to make in the Language Log post.]]></description>
		<content:encoded><![CDATA[<p>Dylan: Good point about the channel side of the problem (converting speech to characters); I was only thinking of the source side of the noisy channel (what sequences of characters are likely).  It really is the whole source/channel system that should be measured for efficiency.  </p>
<p>Transduction can go either way between sounds and characters, depending on whether you&#8217;re listening (speech recognition) or producing (synthesis).  If you view the channel as carrying characters encoded as sounds, then you have p(characters) and p(sounds|characters) as source and channel model.  If you view the channel as carrying sounds encoded as characters, you have p(sounds) and p(characters|sounds) respectively. </p>
<p>The issue of &#8220;entropy of language X&#8221; is about just the source, p(characters) or p(sounds) depending on whether you&#8217;re modeling the language as sequences of characters or sequences of sounds.</p>
<p>The truly interesting case is to take the source as &#8220;meaning&#8221; and the channel as carrying either characters or sounds.  We don&#8217;t quite have a handle on that yet, I&#8217;m afraid.</p>
<p>It&#8217;d be interesting if the various versions of Chinese (e.g. traditional vs. simplified vs. ancient &#8212; my knowledge of Chinese is very limited) changed the encoding rates.  It&#8217;d be the clearest example of the point Mark Liberman was trying to make in the Language Log post.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dylan Thurston</title>
		<link>http://lingpipe-blog.com/2008/04/11/the-entropy-of-english-vs-chinese/#comment-2106</link>
		<dc:creator><![CDATA[Dylan Thurston]]></dc:creator>
		<pubDate>Mon, 14 Apr 2008 01:12:50 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe.wordpress.com/?p=93#comment-2106</guid>
		<description><![CDATA[You talk about spelling distinctions in English not present in the spoken speech, but oddly don&#039;t mention the much more prevalent distinctions in Chinese not present in the spoken speech: there are generally many written characters representing a given spoken syllable.  A native speaker has no trouble reconstructing the character from the spoken speech (except for some names), but my understanding is that modern written speech uses somewhat fewer characters because of the extra distinctions available.  (Ancient written speech was much more compressed.)  This surely affects compressibility.]]></description>
		<content:encoded><![CDATA[<p>You talk about spelling distinctions in English not present in the spoken speech, but oddly don&#8217;t mention the much more prevalent distinctions in Chinese not present in the spoken speech: there are generally many written characters representing a given spoken syllable.  A native speaker has no trouble reconstructing the character from the spoken speech (except for some names), but my understanding is that modern written speech uses somewhat fewer characters because of the extra distinctions available.  (Ancient written speech was much more compressed.)  This surely affects compressibility.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

