Removing HTML white-space (spaces, tabs, blank lines, and comments) makes a file slightly smaller and faster to send to a site visitor. The improvement you get depends upon how verbose your HTML is to start with. This article uses the HTML Tidy optimizer and measures the improvement for a sample web site and 22 different standard themes or page templates. Each theme generates different HTML and shows a different level of improvement from HTML optimization. Unfortunately, in all cases the improvement is tiny and probably not worth the effort.
Table of Contents
- How to remove HTML white space
- How well does it work?
- Effect on HTML page template/theme sizes
- Effect on HTML page sizes
- Effect on HTML page sizes, after compression
- Effect on HTML page sizes, after compression and including CSS and images
- Further reading
- Appendix: How I tested
This article is part of a series on Essential steps to speed up a Drupal web site. The discussion here, however, applies equally well to any type of web site, not just one that uses Drupal.
How to remove HTML white space
The bigger the HTML page, the longer it takes to send to your site's visitors. Speed up a site by reducing page size. One way is to remove HTML bytes that don't need to be there:
- Remove indentation.
- Remove blank lines.
- Remove comments.
- Concatenate short lines to remove unnecessary line breaks.
Indentation, comments, etc., are often in the HTML to make the file easier to read for the site's designers. But none of these make a difference to the site visitor or their web browser. Reduce page load times by removing these extra bytes.
Doing optimization by hand is tedious. Instead, use an HTML optimizer application. There are lots of free and commercial tools, including:
- Absolute HTML Compressor (freeware)
- Advanced HTML Optimizer (commercial)
- HTML Code Cleaner (commercial)
- HTML Optimizer (freeware)
- HTML-Optimizer (commercial)
- HTML-Optimizer Pro (commercial)
- HTML Tidy (freeware)
- Hunter HTML Optimizer (commercial)
All of these tools will do about the same thing. Their main differences are in the quality of their user interfaces, the operating systems they work on, and how much they cost.
HTML Tidy is my tool of choice. It is one of the first HTML optimizers, it's free, it is under constant development by the Open Source community, and it has been incorporated into multiple free and commercial products for Windows, Mac OS X, and Linux. There are command-line tools for use in scripts, programmer libraries for custom Java, Perl, or Python programs, and the mod_tidy module for Apache 2 that will automatically optimize every HTML page delivered by the web server.
Some other tools will do more "optimizations" than HTML Tidy, but beware. One "optimization" promoted by another tool removes all of the double-quotes around tag attribute values. This produces invalid X/HTML. Another "optimization" removes some closing tags, and again this produces invalid X/HTML. Still another "optimization" replaces <strong> tags with <b>, <em> with <i>, etc., which again produces invalid XHTML and it will break some style sheets. Once you turn off these invalid optimizations in fancy tools, you get back to the same safe valid set of optimizations that all of the tools can do.
Before spending time and money on any of these, though, let's do some testing to see what HTML optimization can do for a web site.
How well does it work?
I measured the effect of HTML optimization for a variety of test cases. I hand-edited several files first and compared them to the output of HTML Tidy. They differed by only a few bytes, which is surprisingly good. Instead of doing any further hand-editing, all of the following tests use HTML Tidy. I used the Balthisar Tidy user interface for HTML Tidy on a Mac.
Effect on HTML page template/theme sizes
Many web sites today use page templates or themes that provide the common elements of every site page, such as banners, menus, and footers. Only the body text changes from page to page. Change the template or theme, and the look of every page changes.
To optimize a site quickly, optimize the page template with HTML Tidy. I measured the improvement on a selection of page templates from themes for the Drupal content management system. The same could have been done for themes for WordPress, or templates used by Dreamweaver or iWeb. (see the appendix for details on how I did these tests)
The average reduction in size for optimized page templates was 16%. This seems good, but a page template is only the shell of a page, lacking any real content. What happens if we add the content?
Effect on HTML page sizes
I applied each theme to a test web site and measured the size of the site’s home page before and after optimization with HTML Tidy. The home page I used is fairly complex, with multiple blocks of text, lists, images, forms, and tables. (see the appendix for details on how I did these tests)
The average reduction in size for optimized HTML was 4%. That's less impressive. Bytes were still saved, but the size of the content overwhelms the bytes saved through HTML optimization.
Effect on HTML page sizes, after compression
Production web sites should always use Apache file compression. What happens if we use HTML Tidy, then compress the pages using Apache and mod_gzip or mod_deflate? (see the appendix for details on how I did these tests)
The average reduction in size for compressed optimized HTML is just 3%. Compression did a spectacular job of reducing page sizes by 78%, and removing 21,000 bytes on average. In comparison, the 150 byte improvement from HTML optimization isn't very impressive.
The HTML is just part of a real web page. What happens if we include the style sheets and images?
Effect on HTML page sizes, after compression and including CSS and images
I re-calculated the improvement from HTML optimization, now including the size of CSS and images needed by the home page for each of the tested themes. (see the appendix for details on how I did these tests)
The average reduction in size for optimized HTML in the context of a full page is just 0.4%. The size of the CSS and images for the page overwhelm the meager gains from HTML optimization.
HTML optimization seems like a good idea at first. But, in the real-world context of a web page with style sheets and images, and a web server properly configured to compress content, the few bytes saved from HTML optimization are hardly worth the effort.
To put this in perspective, the average savings in the above tests, after compression, was just 150 bytes. On a typical cable modem or DSL connection supporting 6 Mbps (megabits per second) downloads, that 150 bytes takes about 1/4000th of a second to send. Saving that time isn't going to be noticeable by your site visitors.
Also consider that one web server message to get a file takes about 250 bytes, plus the size of the file. To optimize a site, you would do much better to remove just one image from the design, saving that 250 bytes for the message, plus the bytes for the image. Dropping even a tiny one-pixel image will save you more than HTML page optimization is likely to do.
An advertisement for one HTML optimization product claimed a 20% reduction in page size! But what is a "typical" page for that claim? Most pages today are created by authoring tools (such as Dreamweaver or iWeb) or by content management systems (such as Drupal or WordPress), and both ways produce lean HTML with little to remove in HTML optimization. If you can get a 20% reduction from optimization, then something is wrong with the authoring tool you are using.
Appendix: How I tested
All testing used a Drupal 5.1 web site loaded with sample content and a home page layout that listed teasers for the 10 most recent posts in the body. Blocks on the left side of the page supported menus and a recent image. Testing used standard themes downloaded from the Drupal web site.
All size measurements were done by simply counting bytes before and after running HTML Tidy on the theme template or the site home page. Full page sizes, including CSS and images, were measured by monitoring a page loaded into Firefox while using the Charles proxy server on a Mac.
As with all benchmarking, your results may vary due to differences in your page layout, content, and choice of theme or page template. Use these results only as a rough guideline.