Most symbol characters, like + = © ™ ← → ☺ ♣ ♠, need to be stripped out of web page text before processing it in a search engine or text analysis tool. For international text there are thousands of symbol characters, but some should be removed in one context, but not in another. This tip shows how.
When processing text for a search engine or analysis tool, code needs to strip out punctuation, formatting, spacing, and control characters to reveal indexable text. In international text there are hundreds of these characters, and some should be removed in one context, but not in another. This tip shows how.
The HTML tags on a web page must be stripped away to get clean text for a PHP search engine, keyword extractor, or some other page analysis tool. PHP's standard
strip_tags( ) function will do part of the job, but you need to strip out styles, scripts, embedded objects, and other unwanted page code first. This tip shows how.