• May 27, 2008

    An absolute URL is complete and ready to use to download a web file. But web pages often include incomplete relative URLs with missing parts, such as an "http" or host name, or the first part of a file path. These parts need to be filled in by copying them from a base absolute URL. This article shows how and includes code to do it.

  • May 27, 2008

    Splitting apart and rebuilding URLs is essential for link checkers, phishing detectors, spiders, and so on. PHP's standard parse_url( ) function works pretty well to parse simple URLs, but it has problems with complex and relative URLs. Once split apart, there is no standard PHP function to reassemble the URL properly. This article reviews the official syntax of URLs, discusses URL parsing complexities, and provides new PHP functions to split apart a URL and join its parts together again.

  • April 20, 2008

    Java's threads are essential for building complex applications, but thread control is split across several classes in different packages added at different times in the JDK's history. This tip shows how to connect these classes together to find threads and thread groups, and get thread information.

  • April 13, 2008

    Web page keywords characterize the page's topic for a search engine. Extracting keywords requires that you recognize the page's character encoding, strip away HTML tags, scripts, and styles, decode HTML entities, and remove unwanted punctuation, symbols, numbers, and stop words. This article shows how.

  • March 20, 2008

    Performance optimization requires that you measure the time to perform a task, then try algorithm and coding changes to make the task faster. Prior to Java 5, the only way to time a task was to measure wall clock time. Unfortunately, this gives inaccurate results when there is other activity on the system (and there always is). Java 5 introduced the java.lang.management package and methods to report CPU and user time per thread. These times are not affected by other system activity, making them just what we need for benchmarking. This article shows how to use the java.lang.management package to benchmark your application.

  • February 24, 2008

    Java has several classes for reading files, with and without buffering, random access, thread safety, and memory mapping. Some of these are much faster than the others. This article benchmarks 13 ways to read bytes from a file and shows which ways are the fastest.

  • January 6, 2008

    The starting point for building a link checker, web spider, or web page analyzer is, of course, to get the web page from the web server. Java's java.net package includes classes to manage URLs and to open web server connections. This tip shows how to use them to a get text, image, audio, or data file from a web server.

  • January 3, 2008

    URL extraction is at the core of link checkers, search engine spiders, and a variety of web page analysis tools. While <a> and <img> elements are primary sources of URLs, there are more than 70 element attributes with URLs in HTML, XHTML, WML, and assorted HTML extensions. This tip shows how to extract URLs from all of these.

  • January 3, 2008

    Though HTML is usually the focus for extracting URLs for a link checker or analysis tool, CSS files also include URLs. The CSS @import rule uses a URL to include another CSS file, and many style properties include a URL to load an image or other content. This tip shows how to scan a CSS file and extract its URLs.

  • January 1, 2008

    Zebra stripes in a graphical user interface (GUI) are subtle background stripes painted behind the rows of a hierarchical list, or tree. They improve the readability of wide tree rows, but the JTree class in Java's Swing doesn't support them. This tip shows how to extend JTree to add zebra background stripes.

Syndicate content
Nadeau software consulting
Nadeau software consulting