Removing HTML white-space (spaces, tabs, blank lines, and comments) makes a file slightly smaller and faster to send to a site visitor. The improvement you get depends upon how verbose your HTML is to start with. This article uses the HTML Tidy optimizer and measures the improvement for a sample web site and 22 different standard themes or page templates. Each theme generates different HTML and shows a different level of improvement from HTML optimization. Unfortunately, in all cases the improvement is tiny and probably not worth the effort.
Table of Contents
This article is part of a series on Essential steps to speed up a Drupal web site. The discussion here, however, applies equally well to any type of web site, not just one that uses Drupal.
How to remove HTML white space
The bigger the HTML page, the longer it takes to send to your site's visitors. Speed up a site by reducing page size. One way is to remove HTML bytes that don't need to be there:
- Remove indentation.
- Remove blank lines.
- Remove comments.
- Concatenate short lines to remove unnecessary line breaks.
Indentation, comments, etc., are often in the HTML to make the file easier to read for the site's designers. But none of these make a difference to the site visitor or their web browser. Reduce page load times by removing these extra bytes.
HTML optimizers
Doing optimization by hand is tedious. Instead, use an HTML optimizer application. There are lots of free and commercial tools, including:
- Absolute HTML Compressor (freeware)
- Advanced HTML Optimizer (commercial)
- HTML Code Cleaner (commercial)
- HTML Optimizer (freeware)
- HTML-Optimizer (commercial)
- HTML-Optimizer Pro (commercial)
- HTML Tidy (freeware)
- Balthisar Tidy (freeware)
- HTML Trim (freeware)
- Tidy UI (freeware)
- Hunter HTML Optimizer (commercial)
All of these tools will do about the same thing. Their main differences are in the quality of their user interfaces, the operating systems they work on, and how much they cost.
HTML Tidy is my tool of choice. It is one of the first HTML optimizers, it's free, it is under constant development by the Open Source community, and it has been incorporated into multiple free and commercial products for Windows, Mac OS X, and Linux. There are command-line tools for use in scripts, programmer libraries for custom Java, Perl, or Python programs, and the mod_tidy module for Apache 2 that will automatically optimize every HTML page delivered by the web server.
Some other tools will do more "optimizations" than HTML Tidy, but beware. One "optimization" promoted by another tool removes all of the double-quotes around tag attribute values. This produces invalid X/HTML. Another "optimization" removes some closing tags, and again this produces invalid X/HTML. Still another "optimization" replaces <strong> tags with <b>, <em> with <i>, etc., which again produces invalid XHTML and it will break some style sheets. Once you turn off these invalid optimizations in fancy tools, you get back to the same safe valid set of optimizations that all of the tools can do.
Before spending time and money on any of these, though, let's do some testing to see what HTML optimization can do for a web site.
How well does it work?
I measured the effect of HTML optimization for a variety of test cases. I hand-edited several files first and compared them to the output of HTML Tidy. They differed by only a few bytes, which is surprisingly good. Instead of doing any further hand-editing, all of the following tests use HTML Tidy. I used the Balthisar Tidy user interface for HTML Tidy on a Mac.
Effect on HTML page template/theme sizes
Many web sites today use page templates or themes that provide the common elements of every site page, such as banners, menus, and footers. Only the body text changes from page to page. Change the template or theme, and the look of every page changes.
To optimize a site quickly, optimize the page template with HTML Tidy. I measured the improvement on a selection of page templates from themes for the Drupal content management system. The same could have been done for themes for WordPress, or templates used by Dreamweaver or iWeb. (see the appendix for details on how I did these tests)
| Theme | Original | Optimized | Saved | Percent | |
|---|---|---|---|---|---|
| Aberdeen | 3,141 | 2,373 | 768 | 24% | |
| Amadou | 3,733 | 2,286 | 1,447 | 39% | |
| Andreas09 | 2,154 | 2,026 | 128 | 6% | |
| Antique Modern | 2,970 | 2,223 | 747 | 25% | |
| Arcmateria | 2,485 | 2,165 | 320 | 13% | |
| Blix | 2,054 | 1,780 | 274 | 13% | |
| Blue Breeze | 3,556 | 2,709 | 847 | 24% | |
| Blue Marine | 2,304 | 2,113 | 191 | 8% | |
| Brushed Steel | 4,530 | 3,291 | 1,239 | 27% | |
| Fancy | 3,818 | 3,624 | 194 | 5% | |
| Gagarin | 2,134 | 1,938 | 196 | 9% | |
| Garamond | 2,324 | 2,015 | 309 | 13% | |
| Garland | 3,706 | 2,929 | 777 | 21% | |
| Glossy Blue | 2,349 | 2,080 | 269 | 11% | |
| iTheme | 2,364 | 1,934 | 430 | 18% | |
| Kubrick | 1,692 | 1,494 | 198 | 12% | |
| News Portal | 1,840 | 1,720 | 120 | 7% | |
| Ocadia | 3,241 | 2,837 | 404 | 12% | |
| Push Button | 3,882 | 3,407 | 475 | 12% | |
| Slash | 2,582 | 2,570 | 12 | 0% | |
| Stylized Beauty | 3,500 | 2,734 | 766 | 22% | |
| Zen | 3,685 | 2,623 | 1,062 | 29% | |
| Average | 508 | 16% |
The average reduction in size for optimized page templates was 16%. This seems good, but a page template is only the shell of a page, lacking any real content. What happens if we add the content?
Effect on HTML page sizes
I applied each theme to a test web site and measured the size of the site’s home page before and after optimization with HTML Tidy. The home page I used is fairly complex, with multiple blocks of text, lists, images, forms, and tables. (see the appendix for details on how I did these tests)
| Theme | Original | Optimized | Saved | Percent | |
|---|---|---|---|---|---|
| Aberdeen | 28,141 | 26,017 | 2,124 | 8% | |
| Amadou | 27,764 | 25,631 | 2,133 | 8% | |
| Andreas09 | 25,791 | 25,108 | 683 | 3% | |
| Antique Modern | 27,057 | 25,601 | 1,456 | 5% | |
| Arcmateria | 26,766 | 25,772 | 994 | 4% | |
| Blix | 26,395 | 25,699 | 696 | 3% | |
| Blue Breeze | 28,624 | 26,177 | 2,447 | 9% | |
| Blue Marine | 26,911 | 26,030 | 881 | 3% | |
| Brushed Steel | 29,418 | 26,933 | 2,485 | 8% | |
| Fancy | 30,400 | 29,327 | 1,073 | 4% | |
| Gagarin | 26,164 | 25,457 | 707 | 3% | |
| Garamond | 24,773 | 24,102 | 671 | 3% | |
| Garland | 28,273 | 26,607 | 1,666 | 6% | |
| Glossy Blue | 24,789 | 24,151 | 638 | 3% | |
| iTheme | 27,283 | 26,340 | 943 | 3% | |
| Kubrick | 22,994 | 22,583 | 411 | 2% | |
| News Portal | 25,743 | 24,954 | 789 | 3% | |
| Ocadia | 26,374 | 25,820 | 554 | 2% | |
| Push Button | 27,856 | 26,845 | 1,011 | 4% | |
| Slash | 28,099 | 27,021 | 1,078 | 4% | |
| Stylized Beauty | 28,013 | 26,555 | 1,458 | 5% | |
| Zen | 28,634 | 26,272 | 2,362 | 8% | |
| Average | 1,239 | 4% |
The average reduction in size for optimized HTML was 4%. That's less impressive. Bytes were still saved, but the size of the content overwhelms the bytes saved through HTML optimization.
Effect on HTML page sizes, after compression
Production web sites should always use Apache file compression. What happens if we use HTML Tidy, then compress the pages using Apache and mod_gzip or mod_deflate? (see the appendix for details on how I did these tests)
| Theme | Original | Optimized | Saved | Percent | |
|---|---|---|---|---|---|
| Aberdeen | 5,986 | 5,718 | 268 | 4% | |
| Amadou | 6,089 | 5,700 | 389 | 6% | |
| Andreas09 | 5,567 | 5,522 | 45 | 1% | |
| Antique Modern | 5,763 | 5,586 | 177 | 3% | |
| Arcmateria | 5,795 | 5,632 | 163 | 3% | |
| Blix | 5,751 | 5,648 | 103 | 2% | |
| Blue Breeze | 5,862 | 5,698 | 164 | 3% | |
| Blue Marine | 5,783 | 5,723 | 60 | 1% | |
| Brushed Steel | 6,236 | 5,888 | 348 | 6% | |
| Fancy | 6,365 | 6,229 | 136 | 2% | |
| Gagarin | 5,708 | 5,599 | 109 | 2% | |
| Garamond | 5,249 | 5,133 | 116 | 2% | |
| Garland | 6,018 | 5,754 | 264 | 4% | |
| Glossy Blue | 5,245 | 5,149 | 96 | 2% | |
| iTheme | 5,953 | 5,764 | 189 | 3% | |
| Kubrick | 5,080 | 5,017 | 63 | 1% | |
| News Portal | 5,554 | 5,506 | 48 | 1% | |
| Ocadia | 5,784 | 5,708 | 76 | 1% | |
| Push Button | 5,951 | 5,859 | 92 | 2% | |
| Slash | 5,805 | 5,765 | 40 | 1% | |
| Stylized Beauty | 6,006 | 5,919 | 87 | 1% | |
| Zen | 6,026 | 5,762 | 264 | 4% | |
| Average | 150 | 3% |
The average reduction in size for compressed optimized HTML is just 3%. Compression did a spectacular job of reducing page sizes by 78%, and removing 21,000 bytes on average. In comparison, the 150 byte improvement from HTML optimization isn't very impressive.
The HTML is just part of a real web page. What happens if we include the style sheets and images?
Effect on HTML page sizes, after compression and including CSS and images
I re-calculated the improvement from HTML optimization, now including the size of CSS and images needed by the home page for each of the tested themes. (see the appendix for details on how I did these tests)
| Theme | Original | Optimized | Saved | Percent | |
|---|---|---|---|---|---|
| Aberdeen | 23,999 | 23,731 | 268 | 1.1% | |
| Amadou | 36,187 | 35,798 | 389 | 1.1% | |
| Andreas09 | 21,358 | 21,313 | 45 | 0.2% | |
| Antique Modern | 52,785 | 52,608 | 177 | 0.3% | |
| Arcmateria | 19,007 | 18,844 | 163 | 0.9% | |
| Blix | 26,699 | 26,596 | 103 | 0.4% | |
| Blue Breeze | 63,878 | 63,714 | 164 | 0.3% | |
| Blue Marine | 13,864 | 13,804 | 60 | 0.4% | |
| Brushed Steel | 99,466 | 99,118 | 348 | 0.4% | |
| Fancy | 200,484 | 200,348 | 136 | 0.1% | |
| Gagarin | 56,322 | 56,213 | 109 | 0.2% | |
| Garamond | 36,359 | 36,243 | 116 | 0.3% | |
| Garland | 26,192 | 25,928 | 264 | 1.0% | |
| Glossy Blue | 32,006 | 31,910 | 96 | 0.3% | |
| iTheme | 90,039 | 89,850 | 189 | 0.2% | |
| Kubrick | 25,105 | 25,042 | 63 | 0.3% | |
| News Portal | 20,427 | 20,379 | 48 | 0.2% | |
| Ocadia | 76,697 | 76,621 | 76 | 0.1% | |
| Push Button | 28,370 | 28,278 | 92 | 0.3% | |
| Slash | 32,670 | 32,630 | 40 | 0.1% | |
| Stylized Beauty | 28,573 | 28,486 | 87 | 0.3% | |
| Zen | 49,112 | 48,848 | 264 | 0.5% | |
| Average | 150 | 0.4% |
The average reduction in size for optimized HTML in the context of a full page is just 0.4%. The size of the CSS and images for the page overwhelm the meager gains from HTML optimization.
Conclusions
HTML optimization seems like a good idea at first. But, in the real-world context of a web page with style sheets and images, and a web server properly configured to compress content, the few bytes saved from HTML optimization are hardly worth the effort.
To put this in perspective, the average savings in the above tests, after compression, was just 150 bytes. On a typical cable modem or DSL connection supporting 6 Mbps (megabits per second) downloads, that 150 bytes takes about 1/4000th of a second to send. Saving that time isn't going to be noticeable by your site visitors.
Also consider that one web server message to get a file takes about 250 bytes, plus the size of the file. To optimize a site, you would do much better to remove just one image from the design, saving that 250 bytes for the message, plus the bytes for the image. Dropping even a tiny one-pixel image will save you more than HTML page optimization is likely to do.
An advertisement for one HTML optimization product claimed a 20% reduction in page size! But what is a "typical" page for that claim? Most pages today are created by authoring tools (such as Dreamweaver or iWeb) or by content management systems (such as Drupal or WordPress), and both ways produce lean HTML with little to remove in HTML optimization. If you can get a 20% reduction from optimization, then something is wrong with the authoring tool you are using.
Further reading
- Essential steps to speed up a Drupal web site. There are more effective ways to speed up a web site including enabling file compression, using a PHP script cache, enabling MySQL's query cache, and enabling Drupal's page cache and CSS file aggregation.
- Don't bother using CSS optimization to speed up a web site. CSS files often include spaces, blank lines, comments, and redundant selectors and properties. Removing them makes the CSS smaller and faster to send to a site visitor. But the improvement is won't yield a noticeable improvement in page load times.
Appendix: How I tested
All testing used a Drupal 5.1 web site loaded with sample content and a home page layout that listed teasers for the 10 most recent posts in the body. Blocks on the left side of the page supported menus and a recent image. Testing used standard themes downloaded from the Drupal web site.
All size measurements were done by simply counting bytes before and after running HTML Tidy on the theme template or the site home page. Full page sizes, including CSS and images, were measured by monitoring a page loaded into Firefox while using the Charles proxy server on a Mac.
As with all benchmarking, your results may vary due to differences in your page layout, content, and choice of theme or page template. Use these results only as a rough guideline.

Disagree
I have compressed my pages with the absolute html tool and i have to say my site is running alot faster and pingdom.com results say the page load time has droppped between 5-10 seconds on all my pages... I would not say that compressing pages is hardly worth it, in fact i think it can be very good when all else fails...
Re: Disagree
Well, if you shaved 5-10 seconds by removing white space, then something was seriously wrong with your HTML in the first place! Consider that a typical cable modem can receive about 6 Mbits/second, or about 600 Kbytes/second. So, shaving 5 seconds requires a reduction of 5 * 600 = 3 Mbytes! And if you had 3 Mbytes of unnecessary spaces in your HTML, then you badly need to re-examine whatever tool you're using to create that HTML. Don't depend upon optimizer tools to patch your problems — fix the problems instead.
As another point for comparison, a typical complex web page is between 50 and 100 Kbytes (HTML only). A savings of 3 Mbytes is 30 times larger than the total size of most web pages. That seems unlikely.
If you used Pingdom's "full page test", then be aware that it's results are not useful. Here are some of its problems:
The numbers reported by Pingdom, and similar tools, are of no use. Instead, try Firefox's Web Developer and YSlow plugins or Safari's development features. These give real load times in real browsers under real conditions.
Also note that "Absolute HTML Compressor", like many others, can create invalid HTML and change the appearance of your web pages. For instance, it can remove <!DOCTYPE> tags, which will switch a page from strict to quirksmode rendering and affect the layout. It can remove <META> tags, which can affect how page character sets are interpreted and how web spiders treat page links. It can remove double-quotes around tag values, which makes the code invalid HTML (though many browsers will accept it anyway).
Be aware that so-called compressor/optimizer tools game the results they advertise by enabling these invalid and appearance-changing "optimizations" and by applying their tools to ridiculously bad HTML in the first place. When you test these tools on real content, as I have, you find that they provide little benefit and a lot of potential for messed up results. Shaving a few hundred or a even a few thousand bytes from a web page has little impact on page load time. Unless you've got 3 Mbytes of unnecessary spaces in your HTML...
Disagre
Perhaps it may be true in a drupal env. I've saved 50% compressing js css and html on some projects.
The real performance gains are when you can limit the requests to the clients web browser more so then the actual file size. If you can get All your content in 4 requests then your page will load instantly, as web browsers are hardwired to only do 4 requests at a time.
So using 4 requests:
1. (IMAGE.PNG) css sprite image file
2. (SITE.CSS) css compressed
3. (MYSCRIPT.JS) javascript indlude
4. (INDEX.HTML) html
That's how you make a website scream.
Re: Disagre
First, think about that... you saved 50%? That would require that every other character in those files is a space, carriage-return, or part of a comment. If that's really the case, something is seriously wrong with the way you are authoring these files. Either you are using very bad tools or you are embedding large comment blocks in those files. Get better tools and move comment blocks to a design document. They should never be included in anything you download over and over and over.
Second, you should always use Apache's compression (mod_deflate) to zip files as they are being delivered. This will substantially reduce the size of any text file and save you more than white-space removal will. Once you enable compression, you'll find that the impact of white-space removal is too trivial to be bothered with (see the article's results above).
Third, the HTTP 1.1 specification says web browsers may make two requests in parallel, not four. Some browsers don't follow these rules, and some browser plugins let you override these rules. But as a web site developer, you should design for browsers that follow the rules.
Fourth, the way browsers make parallel requests is more complex than you imply. Browsers start with the HTML file and queue file requests as they encounter tags for images, flash, CSS files, and scripts. File queueing recurses into CSS files to queue background images for the site's theme. File queuing also pauses on scripts until the script finishes executing because it may change the web page's content and what files need to be queued next. The order in which files are requested from the queue then depends upon the order in which requested files arrive, and that depends upon network and web server load, and what is cached at the browser or within the network. All of this makes trying to predict exactly how a browser will request files nearly impossible, so designing a site to optimize for this order is pointless.
Instead of worrying about white-space removal, compression, and parallel browser file requests, use this one simple rule:
The time it takes to request and get a file swamps the time it takes to download it (unless it is a giant multi-megabyte file). Don't worry about making files a few bytes smaller. Worry about using fewer files in the first place.
Post new comment