<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://NadeauSoftware.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>NadeauSoftware.com articles from June, 2007</title>
 <link>http://NadeauSoftware.com/articles/2007/06</link>
 <description>A list of articles, sorted by title.</description>
 <language>en</language>
<item>
 <title>PHP tip: How to decode HTML entities on a web page</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page</link>
 <description>&lt;p class=&quot;summary&quot;&gt;HTML entities encode special characters and symbols, such as &lt;code&gt;&amp;amp;euro;&lt;/code&gt; for &amp;euro;, or &lt;code&gt;&amp;amp;copy;&lt;/code&gt; for &amp;copy;. When building a PHP search engine or web page analysis tool, HTML entities within a page must be decoded into single characters to get clean parsable text. PHP’s standard &lt;code&gt;html_entity_decode()&lt;/code&gt; function will do the job, but you must use a rich character encoding, such as UTF-8, and multibyte character strings.  This tip shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 30 Jun 2007 07:27:46 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">54 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to get a web page content type</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type</link>
 <description>&lt;p class=&quot;summary&quot;&gt;A web page’s content type tells you the page&#039;s MIME type (such as “text/html” or “image/png”) and the character set used by page text. You&#039;ll need the character set to interpret the page&#039;s characters for text processing for a search engine or keyword extractor. The content type should be in the web server’s HTTP header for the page, but it also can be set in an HTML file’s &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; tag, or an XML file’s &lt;code&gt;&amp;lt;?xml&amp;gt;&lt;/code&gt; tag. This tip shows how to get the page&#039;s content type and extract the MIME type and character set.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 16 Jun 2007 07:03:06 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">53 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to get a web page using CURL</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl</link>
 <description>&lt;p class=&quot;summary&quot;&gt;The first step when building a PHP search engine, link checker, or keyword extractor is to get the web page from the web server. There are several ways to do this. From PHP 4 onwards, the most flexible way uses PHP’s CURL (Client URL) functions. This tip shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sun, 10 Jun 2007 06:02:40 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">52 at http://NadeauSoftware.com</guid>
</item>
</channel>
</rss>
