<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://NadeauSoftware.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>PHP</title>
 <link>http://NadeauSoftware.com/articles/php</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>PHP tip: How to convert a relative URL to an absolute URL</title>
 <link>http://NadeauSoftware.com/articles/2008/05/php_tip_how_convert_relative_url_absolute_url</link>
 <description>&lt;p class=&quot;summary&quot;&gt; An &lt;em&gt;absolute&lt;/em&gt; URL is complete and ready to use to download a web file. But web pages often include incomplete &lt;em&gt;relative&lt;/em&gt; URLs with missing parts, such as an &amp;quot;http&amp;quot; or  host name, or the first part of a file path. These parts need to be filled in by copying them from a &lt;em&gt;base&lt;/em&gt; absolute URL. This article shows how and includes code to do it.&lt;/p&gt;



&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2008/05/php_tip_how_convert_relative_url_absolute_url&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2008/05/php_tip_how_convert_relative_url_absolute_url#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Tue, 27 May 2008 21:39:08 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">79 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to decode HTML entities on a web page</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page</link>
 <description>&lt;p class=&quot;summary&quot;&gt;HTML entities encode special characters and symbols, such as &lt;code&gt;&amp;amp;euro;&lt;/code&gt; for &amp;euro;, or &lt;code&gt;&amp;amp;copy;&lt;/code&gt; for &amp;copy;. When building a PHP search engine or web page analysis tool, HTML entities within a page must be decoded into single characters to get clean parsable text. PHP’s standard &lt;code&gt;html_entity_decode()&lt;/code&gt; function will do the job, but you must use a rich character encoding, such as UTF-8, and multibyte character strings.  This tip shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_decode_html_entities_web_page#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 30 Jun 2007 07:27:46 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">54 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to extract keywords from a web page</title>
 <link>http://NadeauSoftware.com/articles/2008/04/php_tip_how_extract_keywords_web_page</link>
 <description>&lt;p class=&quot;summary&quot;&gt;Web page keywords characterize the page&#039;s topic for a search engine. Extracting keywords requires that you recognize the page&#039;s character encoding, strip away HTML tags, scripts, and styles, decode HTML entities, and remove unwanted punctuation, symbols, numbers, and stop words. This article shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2008/04/php_tip_how_extract_keywords_web_page&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2008/04/php_tip_how_extract_keywords_web_page#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sun, 13 Apr 2008 10:12:00 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">76 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to extract URLs from a CSS file</title>
 <link>http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_css_file</link>
 <description>&lt;p class=&quot;summary&quot;&gt;Though HTML is usually the focus for extracting URLs for a link checker or analysis tool, CSS files also include URLs. The CSS &lt;code&gt;@import&lt;/code&gt; rule uses a URL to include another CSS file, and many style properties include a URL to load an image or other content. This tip shows how to scan a CSS file and extract its URLs.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_css_file&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_css_file#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Thu, 03 Jan 2008 11:12:49 -0800</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">71 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to extract URLs from a web page</title>
 <link>http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_web_page</link>
 <description>&lt;p class=&quot;summary&quot;&gt;URL extraction is at the core of link checkers, search engine spiders, and a variety of web page analysis tools. While &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; elements are primary sources of  URLs, there are more than 70  element attributes with URLs in HTML, XHTML, WML, and assorted HTML extensions. This tip shows how to extract URLs from all of these.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_web_page&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2008/01/php_tip_how_extract_urls_web_page#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Thu, 03 Jan 2008 11:20:03 -0800</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">72 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to get a web page content type</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type</link>
 <description>&lt;p class=&quot;summary&quot;&gt;A web page’s content type tells you the page&#039;s MIME type (such as “text/html” or “image/png”) and the character set used by page text. You&#039;ll need the character set to interpret the page&#039;s characters for text processing for a search engine or keyword extractor. The content type should be in the web server’s HTTP header for the page, but it also can be set in an HTML file’s &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; tag, or an XML file’s &lt;code&gt;&amp;lt;?xml&amp;gt;&lt;/code&gt; tag. This tip shows how to get the page&#039;s content type and extract the MIME type and character set.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 16 Jun 2007 07:03:06 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">53 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to get a web page using CURL</title>
 <link>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl</link>
 <description>&lt;p class=&quot;summary&quot;&gt;The first step when building a PHP search engine, link checker, or keyword extractor is to get the web page from the web server. There are several ways to do this. From PHP 4 onwards, the most flexible way uses PHP’s CURL (Client URL) functions. This tip shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sun, 10 Jun 2007 06:02:40 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">52 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to get a web page using the fopen wrappers</title>
 <link>http://NadeauSoftware.com/articles/2007/07/php_tip_how_get_web_page_using_fopen_wrappers</link>
 <description>&lt;p class=&quot;summary&quot;&gt;PHP’s fopen wrappers enable the standard file functions to read web pages from a web server. A few additional calls are needed to set parameters for a web server request and to get the server’s HTTP response header. This tip shows how. &lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/07/php_tip_how_get_web_page_using_fopen_wrappers&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/07/php_tip_how_get_web_page_using_fopen_wrappers#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 14 Jul 2007 07:30:43 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">57 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to parse and build URLs</title>
 <link>http://NadeauSoftware.com/articles/2008/05/php_tip_how_parse_and_build_urls</link>
 <description>&lt;p class=&quot;summary&quot;&gt; Splitting apart and rebuilding URLs is essential for link checkers,  phishing detectors, spiders, and so on. PHP&#039;s standard &lt;code&gt;parse_url(&amp;nbsp;)&lt;/code&gt; function  works pretty well to parse simple URLs, but it has problems with complex  and relative URLs.  Once split apart, there is no standard PHP function to  reassemble the URL properly. This article  reviews the official syntax of URLs, discusses  URL parsing complexities, and provides new PHP functions to split apart a URL and join its parts together again.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2008/05/php_tip_how_parse_and_build_urls&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2008/05/php_tip_how_parse_and_build_urls#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Tue, 27 May 2008 21:35:20 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">78 at http://NadeauSoftware.com</guid>
</item>
<item>
 <title>PHP tip: How to strip HTML tags, scripts, and styles from a web page</title>
 <link>http://NadeauSoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page</link>
 <description>&lt;p class=&quot;summary&quot;&gt;The HTML tags on a web page must be stripped away to get clean text for a PHP search engine, keyword extractor, or some other page analysis tool. PHP&#039;s standard &lt;code&gt;strip_tags(&amp;nbsp;)&lt;/code&gt; function will do part of the job, but you need to strip out styles, scripts, embedded objects, and other unwanted page code first.  This tip shows how.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://NadeauSoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://NadeauSoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page#comments</comments>
 <category domain="http://NadeauSoftware.com/articles/php">PHP</category>
 <category domain="http://NadeauSoftware.com/articles/text_processing">Text processing</category>
 <pubDate>Sat, 01 Sep 2007 20:04:39 -0700</pubDate>
 <dc:creator>Dave_Nadeau</dc:creator>
 <guid isPermaLink="false">60 at http://NadeauSoftware.com</guid>
</item>
</channel>
</rss>
