Publishing an email address on a web page invites more spam. Protect your address by masking it from the email harvesters (spambots) used by spammers. This article tests 50 masking methods against 23 harvesters to see which methods work to stop spammers, and which do not.
Table of Contents
Spam, spam, spam
A 2002 U.S. Federal Trade Commission (FTC) study, Email Address Harvesting: How Spammers Reap What You Sow (PDF), posted test email addresses around the Internet and measured the amount of spam they received. 86% of all email addresses published on web pages received spam (PDF chart). One test address received spam just nine minutes after the address was first published. So, what can you do to publish your address, but keep spammers from finding it?
The FTC recommends masking your address to make it harder to find by the automated email harvesters (“spam robots” or “spambots”) used by spammers. But there have been dozens of methods proposed to mask an address. This article tests 50 of them. Email addresses masked by each method are run past a collection of harvesters to see which methods work, and which do not. The results may surprise you. Many popular email masking methods don’t work.
How to protect email addresses
To better cover all of these masking methods, I've grouped similar methods together and review each group in its own article. Each article explains the method, shows examples, and tabulates harvester test results.
The table below summarizes the email address protection methods reviewed in these articles. Each method is graded on its effectiveness, browser support, usability, and accessibility for the disabled. A method is effective only if none of the tested email harvesters bypassed it. A method is well-supported only if all major current web browsers support it well. And a method is usable and accessible if it is not awkward for visitors to use and it is readable by screen readers used by the visually impaired.
|Well-supported||Good usability||Good accessibility|
|Plain email address||√||√||√|
|Replace the “@” with a character code - decimal||√||√||√|
|Replace the “@” with a character code - hex||√||√||√|
|Replace the whole address with character codes - decimal||√||√||√|
|Replace the whole address with character codes - hex||√||√||√|
|Replace the whole address with character codes - mix||√||√||√|
|Replace the address in a mailto link with URL character codes||√||√||√|
|Use CSS to reverse a backwards email address||for now|
|Use a <bdo> tag to reverse a backwards email address||for now|
|Split the address onto separate lines||√||√||√||√|
|Add “nospam” within the address - user name||√||awkward||awkward|
|Add “nospam” within the address - domain name||√||awkward||awkward|
|Spell out the punctuation - “ at ”||√||awkward||√|
|Spell out the punctuation - “(at)”||√||awkward||awkward|
|Spell out the punctuation - “[at]”||√||awkward||awkward|
|Add spaces between the characters||for now||√|
|Embed an HTML comment||for now||√||√||√|
|Embed an HTML tag - empty||√||√||√|
|Embed an HTML tag - around “@”||√||√||√|
|Embed an HTML tag - hidden text||√||√|
|Distribute characters into HTML table cells||√||√|
|Draw the address using a CSS “font”||√||√|
|Draw the address using ASCII art||√||√|
|Use AJAX to retrieve and insert an email address||√||must enable||√||√|
|Use CSS to insert an email address||√|
|Image and Flash insertion methods:|
|Replace “@” with an image of an “@”||√||√|
|Replace the whole address with an image||√||√|
|Replace the whole address with a Flash animation||√||must enable|
|Page hiding methods:|
|Use “robots.txt” for a web site
|Use meta tags on a page - nofollow||√||√||√|
|Use meta tags on a page - noindex||√||√||√|
|Use “nofollow” on links
|Use Flash to link to a hidden page||√||must enable||√|
|Use a form to link to a hidden page||√||√||√|
|Embed a page within a frame||√||√||√|
|Embed a page within an iframe||√||√||√|
|Redirect to a “mailto” link - PHP||√||√||√|
|Redirect to a web page - PHP||√||√|
|Redirect to a web page - Apache||√||√|
|Redirect to a web page - Meta refresh tag||√|
|Spammer blocking methods:|
|Block access based upon the IP address||√||√||√|
|Block access based upon the user-agent||√||√||√|
|Require a login to access the site||√||√|
|Contact form methods:|
|Use a contact form||√||√|
|Add a CAPTCHA image challenge||√||√|
|Add a CAPTCHA math challenge||√||√||√|
About two thirds of the email address protection methods tested are not effective at stopping spammers. Surprisingly, many popular methods are not effective, including:
- Obfuscating an email address with ASCII character codes.
- Replacing “@” with “at”, “(at)”, or “[at]” in an email address.
- Embedding HTML tags within an email address.
- Using “robots.txt”, meta tags, and URL “nofollow” attributes to stop web spiders from visiting site pages.
Overall, most email address protection methods that are effective also have poor usability and accessibility. Tricky methods sometimes don’t work in all browsers. Contact forms, login pages, Flash animations, and CAPTCHA challenges will annoy some visitors. If you obsess too much about protecting your email address from spammers, you’ll also block or annoy legitimate visitors. Spam may be the price you pay for maintaining open communications lines for your web site’s visitors.
Split the email address onto two lines
However, keep in mind that while an automated harvester may be stopped by this and some of the other methods, a human harvester will get them all. No matter how well you protect your address, you’ll probably still get some spam. Use a spam filter for your email program. A 2005 U.S. Federal Trade Commission study, Email Address Harvesting and the Effectiveness of Anti-Spam Filters (PDF), found that 95% of spam could be stopped by a spam filter.
What else can you do?
There are several more methods that I didn’t try because they are complex to set up or they won’t work (yet) in most web browsers:
- Use a disposable email address. Publish your email address as plain text and don’t worry about protecting it. When it starts getting too much spam, delete the old email address and make a new one. This can be a hassle to manage, so there are many web services available that provide disposable email addresses. The free flow-to.com service has an interesting twist. Instead of posting your email address to a web page, you post a special link to their site. The site responds with a generated one-time-use email address that, when mailed to within 24 hours, will forward the email to you. If harvested, the generated address is unlikely to be valid by the time it gets used.
- Draw an email address using Scalable Vector Graphics (SVG). SVG is a formatting language that describes 2D drawings containing lines, areas, and labels. A drawing could include an email address text label or an address drawn with lines. While Adobe has an SVG Viewer plugin, the goal is to build SVG support directly into web browsers. SVG is partially supported by some current web browsers, such as Firefox and Opera, but not yet by Safari or Internet Explorer.
- Draw an email address using Microsoft’s Silverlight plugin. This is Microsoft’s new Flash-like plugin to draw shapes on a web page. Like Adobe’s Flash, it can be used to draw an email address. As of this writing, the plugin has just been announced in a beta release. It may take years for it to become widely available.
- Draw an email address in a movie shown by Apple’s QuickTime, Real’s RealPlayer, or Microsoft’s Windows Media Player plugins. Most visitors probably have one of these movie players installed. You could create an MPEG4, AVI, QT, or WMA movie containing your email address and play it on a web page. However, the approach is pretty awkward and it will slow down page loads while the browser waits to get the movie from your web server.
- Draw an email address using Java. The Java programming language is an excellent way to create interactive applications, including those started from a Web page. However, there are technical complexities in making Java work for all web browsers and Java is way overkill for protecting a single email address.
- Draw an email address using PDF. This format is widely used for posting fully-formatted documents on the web. Adobe’s Acrobat Reader or Apple’s Preview can show these documents. While a PDF document is an awkward way to protect one email address, it would work well to protect a list of email addresses for a company contact list.
- Draw an email address using other plugins. There are many more plugins for web browsers. You could use Elsevier’s MDL plugin and draw an email address as a chemical formula. Or use a VRML or X3D plugin to draw your email address in flashy 3D. Or embed an email address in blueprints using Autocad’s DXF format, if visitors have a plugin to show it. In certain markets, these plugins are common. But on the web at-large, visitors are unlikely to have or want these plugins.
What email harvesters did I test?
Sorry, I won’t publish (or email you) the names of the email harvesters I tested. I don't want their product names on my web pages, and the unwanted search engine attention that that would bring. And I don't want their developers using my testing to show how well their products work compared to their competitors.
If you really want to do your own tests, search the web with the obvious keywords.
Beware of doing your own testing
As I collected and tested these email harvesters, I observed a few things:
- One email harvester came with a virus embedded. Installation of two harvesters was blocked by Windows when they tried to access protected memory. One email harvester’s installation tried to add an unexplained background service to Windows.
- Every email harvester must access the Internet. Are they just harvesting, or is there anything else that they are sending to and from your computer? One email harvester had a suspiciously high CPU and bandwidth usage while it was idle.
- Several email harvesters can scan a hard drive too. You can aim them at your web browser cache. Or consider how others might use them at an Internet cafe, library, or other public computer site. Or as a virus payload on your computer.
- Several email harvesters could be installed as invisible Internet Explorer plugins to scan web pages as they are browsed. Consider how that might be used at an Internet cafe.
- Several email harvester makers also sell Internet cafe management products. How convenient.
- One email harvester maker also makes a “corporate monitoring” product that does keystroke logging, provides hidden remote access to a PC’s files, and enables remote controls to start applications such that they won’t show up in the Windows task manager. That’d be a handy tool for controlling spammer zombies.
- There are dozens of bulk emailers for sending “newsletters” to a mailing list. Some can be started under remote control without anything showing up on the computer’s screen.
- One application at a bulk email software site helpfully emails you whenever a PC's dynamic IP address changes. That’d be handy for keeping track of spammer zombies.
- Several email harvester developers actually have “anti-spam” policies on their web site. We’re assured that they are “strictly opposed” to spam.
- My favorite harvester/mailer company tag line: “We make a better world.”
If you really must try these yourself, be careful.
Future email harvesting technology
Spammers use the same type of computer as you or I. And each year, the latest computer on the market is almost twice as fast as the year before. When a spammer upgrades their computer, they can use the faster processor to do more sophisticated text scanning and to do a better job of defeating email masking schemes. What could they do in the near future?
Text scanning has two phases: lexical analysis and syntax parsing, Lexical analysis looks for patterns in a sequence of characters, and syntax parsing looks for patterns in those patterns. Lexical analysis is fast, but syntax parsing is slower. For maximum harvesting speed, most current email harvesters use simple lexical analysis and no syntax parsing. They recognize obvious patterns of characters, but they do not look at the context around those characters. With faster computers, spammers can use smarter software to better extract email addresses. When they do, many of today’s email address protection methods will become ineffective.
For example, a regular expression used by a lexical analyzer can easily find an email address like “email@example.com.” Here’s the expression:
For most people this looks like jumbled nonsense, but to a programmer and a regular expression parser, this says “one or more letters or numbers, an @, and one or more letters or numbers”. You can test this yourself, and learn more about regular expressions, by using Rob Locher’s nice Regular Expression Tester.
Email harvesters are already doing this. It is very easy to extend this to look for “person AT example.com,” a masking technique recommended by the U.S. Federal Trade Commission back in 2005. Here’s the expression:
/[A-Z0-9_\-\.]+ *(@|AT) *[A-Z0-9_\-\.]+/gi
Email harvesters are already doing this too, so I’m not giving anything away here.
It is pretty easy to extend this to match any predictable pattern for representing an email address. All of the following can be recognized in a similar way:
person @ example.com
person at example.com
All of this can be done with fast simple lexical analysis. When you add in syntax parsing, you can find more complex patterns where the name and address are more spread out. For instance, syntax parsing can recognize the method I recommended earlier:
Syntax parsing also can recognize this:
|User name||Domain name|
A parser can easily extract rows and columns from a table. Some email harvesters can do this now.
Any predictable pattern can be parsed. It just takes a bit more computer time. And computer time is cheap and getting cheaper. The email address split on to two lines above is only safe today because it isn’t a common enough pattern yet for spammers to have bothered adding it to their harvesters.
Any email masking method based upon a predictable pattern is unsafe. Every numeric character code-based obfuscation method is unsafe. Every method that predictably spells out punctuation or adds spaces or HTML tags is unsafe. Every method that uses a well-known algorithm to scramble or encrypt the letters is unsafe. Every method that predictably spreads out an address onto multiple lines or into table cells is unsafe. If it is predictable, it is parsable by a future email harvester.
Unfortunately, almost every method used to mask email addresses also diminishes the usability and accessibility of the web. A real fix to the spam problem must involve stopping the spammers themselves, whether by legal or technical changes to the Internet.