<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Blegging for Help: Web Scraping for Content?</title>
	<atom:link href="http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/</link>
	<description>Natural Language Processing and Text Analytics</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:47:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: fminer</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-15122</link>
		<dc:creator><![CDATA[fminer]]></dc:creator>
		<pubDate>Wed, 22 Jun 2011 04:27:05 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-15122</guid>
		<description><![CDATA[You can try our production web scraping tool FMiner: http://www.fminer.com .
Now we will make a FREE extraction project for every new user.]]></description>
		<content:encoded><![CDATA[<p>You can try our production web scraping tool FMiner: <a href="http://www.fminer.com" rel="nofollow">http://www.fminer.com</a> .<br />
Now we will make a FREE extraction project for every new user.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: What are some ways to extract the main text from an blog entry using Python? - Quora</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-12169</link>
		<dc:creator><![CDATA[What are some ways to extract the main text from an blog entry using Python? - Quora]]></dc:creator>
		<pubDate>Sun, 26 Dec 2010 19:13:35 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-12169</guid>
		<description><![CDATA[[...] Stenström There are lots of good suggestions in the comment to this blog post: http://lingpipe-blog.com/2010/01...Insert a dynamic date here&#160;BIU&#160;&#160;&#160;&#160;&#160;@&#160;&#160;&#160;&#160;&#160;Edit [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Stenström There are lots of good suggestions in the comment to this blog post: <a href="http://lingpipe-blog.com/2010/01...Insert" rel="nofollow">http://lingpipe-blog.com/2010/01&#8230;Insert</a> a dynamic date here&nbsp;BIU&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Edit [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Web text extraction systems: How to get the main text of an arbitrary web page &#124; FZ Blogs</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-8079</link>
		<dc:creator><![CDATA[Web text extraction systems: How to get the main text of an arbitrary web page &#124; FZ Blogs]]></dc:creator>
		<pubDate>Thu, 16 Sep 2010 19:05:37 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-8079</guid>
		<description><![CDATA[[...] I have seen that the developer of the famous Lingpipe software also looked for a similar thing: Blegging for Help: Web Scraping for Content? [...]]]></description>
		<content:encoded><![CDATA[<p>[...] I have seen that the developer of the famous Lingpipe software also looked for a similar thing: Blegging for Help: Web Scraping for Content? [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dori Stein</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-8077</link>
		<dc:creator><![CDATA[Dori Stein]]></dc:creator>
		<pubDate>Thu, 16 Sep 2010 15:14:17 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-8077</guid>
		<description><![CDATA[I think you should also read about scraping tools and how to compare them on http://www.fornova.net/blog/?p=18]]></description>
		<content:encoded><![CDATA[<p>I think you should also read about scraping tools and how to compare them on <a href="http://www.fornova.net/blog/?p=18" rel="nofollow">http://www.fornova.net/blog/?p=18</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emre Sevinç</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-7792</link>
		<dc:creator><![CDATA[Emre Sevinç]]></dc:creator>
		<pubDate>Mon, 23 Aug 2010 17:52:06 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-7792</guid>
		<description><![CDATA[I&#039;m trying Christian Kohlschütter&#039;s boilerpipe code with Dutch and Turkish news sites, magazines and blogs. So far I&#039;m satisfied with the results. Thanks for making it open source!]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m trying Christian Kohlschütter&#8217;s boilerpipe code with Dutch and Turkish news sites, magazines and blogs. So far I&#8217;m satisfied with the results. Thanks for making it open source!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vít Baisa</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-7787</link>
		<dc:creator><![CDATA[Vít Baisa]]></dc:creator>
		<pubDate>Mon, 23 Aug 2010 14:34:16 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-7787</guid>
		<description><![CDATA[I knew why to write my thesis in English instead of Czech. :) I am glad for your feedback. If you are still interested, please, don&#039;t hesitate to contact me via email (google &quot;baisa vít muni&quot;) and we will surely make understood. :)]]></description>
		<content:encoded><![CDATA[<p>I knew why to write my thesis in English instead of Czech. :) I am glad for your feedback. If you are still interested, please, don&#8217;t hesitate to contact me via email (google &#8220;baisa vít muni&#8221;) and we will surely make understood. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emre Sevinç</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-7786</link>
		<dc:creator><![CDATA[Emre Sevinç]]></dc:creator>
		<pubDate>Mon, 23 Aug 2010 14:05:22 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-7786</guid>
		<description><![CDATA[&gt; Ok. I might package up the code I’ve used and release it GPL’d on my 
&gt; website.

I&#039;d definitely give it a try. Can you share your code?]]></description>
		<content:encoded><![CDATA[<p>&gt; Ok. I might package up the code I’ve used and release it GPL’d on my<br />
&gt; website.</p>
<p>I&#8217;d definitely give it a try. Can you share your code?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emre Sevinç</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-7785</link>
		<dc:creator><![CDATA[Emre Sevinç]]></dc:creator>
		<pubDate>Mon, 23 Aug 2010 14:02:29 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-7785</guid>
		<description><![CDATA[Vít Baisa,

I&#039;d definitely would like to have a copy of your software, is it open source?]]></description>
		<content:encoded><![CDATA[<p>Vít Baisa,</p>
<p>I&#8217;d definitely would like to have a copy of your software, is it open source?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-6720</link>
		<dc:creator><![CDATA[Chris]]></dc:creator>
		<pubDate>Tue, 20 Apr 2010 02:32:46 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-6720</guid>
		<description><![CDATA[I&#039;ve spent the past nine months studying (and waiting out) the market to see what products would best sell via the Internet.  I&#039;ve come across a few price and product scraping services that all seemed just about the same with pricing and results... though some simply provided the data in a collective dump while others provided a more readable format for additional cost.  

The one that seems most attractive to us at this time is Mozenda.  They are the least expensive for the same level of results as a number of others.  If you don&#039;t have budget issues, there are a few that can definitely provide more robust and powerful results.  Those are worth considering down the road for us, just not yet.

Good luck!]]></description>
		<content:encoded><![CDATA[<p>I&#8217;ve spent the past nine months studying (and waiting out) the market to see what products would best sell via the Internet.  I&#8217;ve come across a few price and product scraping services that all seemed just about the same with pricing and results&#8230; though some simply provided the data in a collective dump while others provided a more readable format for additional cost.  </p>
<p>The one that seems most attractive to us at this time is Mozenda.  They are the least expensive for the same level of results as a number of others.  If you don&#8217;t have budget issues, there are a few that can definitely provide more robust and powerful results.  Those are worth considering down the road for us, just not yet.</p>
<p>Good luck!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keyvan</title>
		<link>http://lingpipe-blog.com/2010/01/06/blegging-for-help-web-scraping-for-content/#comment-6703</link>
		<dc:creator><![CDATA[Keyvan]]></dc:creator>
		<pubDate>Wed, 07 Apr 2010 10:20:09 +0000</pubDate>
		<guid isPermaLink="false">http://lingpipe-blog.com/?p=3277#comment-6703</guid>
		<description><![CDATA[Someone mentioned Readability. I ported it a while ago to PHP and have a web-service available which tries to pick out content from HTML pages: http://fivefilters.org/content-only/

There&#039;s a newer version released late January which is supposed to be more accurate: http://blog.arc90.com/2010/01/26/introducing-readability-1-5/]]></description>
		<content:encoded><![CDATA[<p>Someone mentioned Readability. I ported it a while ago to PHP and have a web-service available which tries to pick out content from HTML pages: <a href="http://fivefilters.org/content-only/" rel="nofollow">http://fivefilters.org/content-only/</a></p>
<p>There&#8217;s a newer version released late January which is supposed to be more accurate: <a href="http://blog.arc90.com/2010/01/26/introducing-readability-1-5/" rel="nofollow">http://blog.arc90.com/2010/01/26/introducing-readability-1-5/</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

