Effective methods to protect email addresses from spammers

Technologies: HTML 4+, CSS 2+, PHP (optional), Flash (optional)

Publishing an email address on a web page invites more spam. Protect your address by masking it from the email harvesters (spambots) used by spammers. This article tests 50 masking methods against 23 harvesters to see which methods work to stop spammers, and which do not.

Spam, spam, spam

A 2002 U.S. Federal Trade Commission (FTC) study, Email Address Harvesting: How Spammers Reap What You Sow (PDF), posted test email addresses around the Internet and measured the amount of spam they received. 86% of all email addresses published on web pages received spam (PDF chart). One test address received spam just nine minutes after the address was first published. So, what can you do to publish your address, but keep spammers from finding it?

The FTC recommends masking your address to make it harder to find by the automated email harvesters (“spam robots” or “spambots”) used by spammers. But there have been dozens of methods proposed to mask an address. This article tests 50 of them. Email addresses masked by each method are run past a collection of harvesters to see which methods work, and which do not. The results may surprise you. Many popular email masking methods don’t work.

How to protect email addresses

To better cover all of these masking methods, I've grouped similar methods together and review each group in its own article. Each article explains the method, shows examples, and tabulates harvester test results.

Results

The table below summarizes the email address protection methods reviewed in these articles. Each method is graded on its effectiveness, browser support, usability, and accessibility for the disabled. A method is effective only if none of the tested email harvesters bypassed it. A method is well-supported only if all major current web browsers support it well.  And a method is usable and accessible if it is not awkward for visitors to use and it is readable by screen readers used by the visually impaired.

Summary of email masking methods
  Effective at
stopping harvesters
Well-supported Good usability Good accessibility
Plain email address  
Obfuscation methods:
Replace the “@” with a character code - decimal  
Replace the “@” with a character code - hex  
Replace the whole address with character codes - decimal  
Replace the whole address with character codes - hex  
Replace the whole address with character codes - mix  
Replace the address in a mailto link with URL character codes  
Use CSS to reverse a backwards email address for now      
Use a <bdo> tag to reverse a backwards email address for now      
Fragmenting methods:
Split the address onto separate lines
Add “nospam” within the address - user name   awkward awkward
Add “nospam” within the address - domain name   awkward awkward
Spell out the punctuation - “ at ”   awkward
Spell out the punctuation - “(at)”   awkward awkward
Spell out the punctuation - “[at]”   awkward awkward
Add spaces between the characters for now    
Embed an HTML comment for now
Embed an HTML tag - empty  
Embed an HTML tag - around “@”  
Embed an HTML tag - hidden text    
Distribute characters into HTML table cells    
Draw the address using a CSS “font”    
Draw the address using ASCII art    
JavaScript and CSS text insertion methods:
Use JavaScript to insert an email address must enable
Use JavaScript to unobfuscate and insert an email address for now must enable
Use JavaScript to decrypt and insert an email address must enable
Use JavaScript to pattern replace and insert an email address must enable
Use AJAX to retrieve and insert an email address must enable
Use CSS to insert an email address      
Image and Flash insertion methods:
Replace “@” with an image of an “@”    
Replace the whole address with an image    
Replace the whole address with a Flash animation must enable    
Page hiding methods:
Use “robots.txt” for a web site
 
Use meta tags on a page - nofollow  
Use meta tags on a page - noindex  
Use “nofollow” on links
 
Use JavaScript to link to a hidden page must enable
Use Flash to link to a hidden page must enable  
Use a form to link to a hidden page  
Embed a page within a frame  
Embed a page within an iframe  
Redirect to a “mailto” link - PHP  
Redirect to a web page - PHP    
Redirect to a web page - Apache    
Redirect to a web page - Meta refresh tag      
Spammer blocking methods:
Block access based upon the IP address  
Block access based upon the user-agent  
Require a login to access the site    
Contact form methods:
Use a contact form    
Add a CAPTCHA image challenge    
Add a CAPTCHA math challenge  

About two thirds of the email address protection methods tested are not effective at stopping spammers. Surprisingly, many popular methods are not effective, including:

  • Obfuscating an email address with ASCII character codes.
  • Replacing “@” with “at”, “(at)”, or “[at]” in an email address.
  • Embedding HTML tags within an email address.
  • Using “robots.txt”, meta tags, and URL “nofollow” attributes to stop web spiders from visiting site pages.

Overall, most email address protection methods that are effective also have poor usability and accessibility. Tricky methods sometimes don’t work in all browsers. Contact forms, login pages, Flash animations, and CAPTCHA challenges will annoy some visitors. If you obsess too much about protecting your email address from spammers, you’ll also block or annoy legitimate visitors. Spam may be the price you pay for maintaining open communications lines for your web site’s visitors.

Out of all of these methods, one method stands out as being effective, usable, accessible, functional in all web browsers, JavaScript-free, plugin-free, and easy to author and maintain:

Split the email address onto two lines

like this:

User: person
Domain: example.com

However, keep in mind that while an automated harvester may be stopped by this and some of the other methods, a human harvester will get them all. No matter how well you protect your address, you’ll probably still get some spam. Use a spam filter for your email program. A 2005 U.S. Federal Trade Commission study, Email Address Harvesting and the Effectiveness of Anti-Spam Filters (PDF), found that 95% of spam could be stopped by a spam filter.

What else can you do?

There are several more methods that I didn’t try because they are complex to set up or they won’t work (yet) in most web browsers:

  • Use a disposable email address. Publish your email address as plain text and don’t worry about protecting it. When it starts getting too much spam, delete the old email address and make a new one. This can be a hassle to manage, so there are many web services available that provide disposable email addresses. The free flow-to.com service has an interesting twist. Instead of posting your email address to a web page, you post a special link to their site. The site responds with a generated one-time-use email address that, when mailed to within 24 hours, will forward the email to you. If harvested, the generated address is unlikely to be valid by the time it gets used.
  • Draw an email address using Scalable Vector Graphics (SVG). SVG is a formatting language that describes 2D drawings containing lines, areas, and labels. A drawing could include an email address text label or an address drawn with lines. While Adobe has an SVG Viewer plugin, the goal is to build SVG support directly into web browsers. SVG is partially supported by some current web browsers, such as Firefox and Opera, but not yet by Safari or Internet Explorer.
  • Draw an email address using Microsoft’s Silverlight plugin. This is Microsoft’s new Flash-like plugin to draw shapes on a web page. Like Adobe’s Flash, it can be used to draw an email address. As of this writing, the plugin has just been announced in a beta release. It may take years for it to become widely available.
  • Draw an email address in a movie shown by Apple’s QuickTime, Real’s RealPlayer, or Microsoft’s Windows Media Player plugins. Most visitors probably have one of these movie players installed. You could create an MPEG4, AVI, QT, or WMA movie containing your email address and play it on a web page. However, the approach is pretty awkward and it will slow down page loads while the browser waits to get the movie from your web server.
  • Draw an email address using Java. The Java programming language is an excellent way to create interactive applications, including those started from a Web page. However, there are technical complexities in making Java work for all web browsers and Java is way overkill for protecting a single email address.
  • Draw an email address using PDF. This format is widely used for posting fully-formatted documents on the web. Adobe’s Acrobat Reader or Apple’s Preview can show these documents. While a PDF document is an awkward way to protect one email address, it would work well to protect a list of email addresses for a company contact list.
  • Draw an email address using other plugins. There are many more plugins for web browsers. You could use Elsevier’s MDL plugin and draw an email address as a chemical formula. Or use a VRML or X3D plugin to draw your email address in flashy 3D. Or embed an email address in blueprints using Autocad’s DXF format, if visitors have a plugin to show it. In certain markets, these plugins are common. But on the web at-large, visitors are unlikely to have or want these plugins.
  • Be obscure. Never use a popular method of protecting your email address. If a method gets too popular, spammers will implement a way around it in their email harvesters. If everybody starts using images for email addresses, spammers will add optical character recognition (OCR) to extract them. If everybody uses JavaScript schemes, spammers will add JavaScript support. So, don’t follow the crowd.

What email harvesters did I test?

Sorry, I won’t publish (or email you) the names of the email harvesters I tested. I don't want their product names on my web pages, and the unwanted search engine attention that that would bring. And I don't want their developers using my testing to show how well their products work compared to their competitors.

If you really want to do your own tests, search the web with the obvious keywords.

Beware of doing your own testing

As I collected and tested these email harvesters, I observed a few things:

  • One email harvester came with a virus embedded. Installation of two harvesters was blocked by Windows when they tried to access protected memory. One email harvester’s installation tried to add an unexplained background service to Windows.
  • Every email harvester must access the Internet. Are they just harvesting, or is there anything else that they are sending to and from your computer? One email harvester had a suspiciously high CPU and bandwidth usage while it was idle.
  • Several email harvesters can scan a hard drive too. You can aim them at your web browser cache. Or consider how others might use them at an Internet cafe, library, or other public computer site. Or as a virus payload on your computer.
  • Several email harvesters could be installed as invisible Internet Explorer plugins to scan web pages as they are browsed. Consider how that might be used at an Internet cafe.
  • Several email harvester makers also sell Internet cafe management products. How convenient.
  • One email harvester maker also makes a “corporate monitoring” product that does keystroke logging, provides hidden remote access to a PC’s files, and enables remote controls to start applications such that they won’t show up in the Windows task manager. That’d be a handy tool for controlling spammer zombies.
  • There are dozens of bulk emailers for sending “newsletters” to a mailing list. Some can be started under remote control without anything showing up on the computer’s screen.
  • One application at a bulk email software site helpfully emails you whenever a PC's dynamic IP address changes. That’d be handy for keeping track of spammer zombies.
  • Several email harvester developers actually have “anti-spam” policies on their web site. We’re assured that they are “strictly opposed” to spam.
  • My favorite harvester/mailer company tag line: “We make a better world.”

If you really must try these yourself, be careful.

Future email harvesting technology

Spammers use the same type of computer as you or I. And each year, the latest computer on the market is almost twice as fast as the year before. When a spammer upgrades their computer, they can use the faster processor to do more sophisticated text scanning and to do a better job of defeating email masking schemes. What could they do in the near future?

Text scanning has two phases: lexical analysis and syntax parsing, Lexical analysis looks for patterns in a sequence of characters, and syntax parsing looks for patterns in those patterns. Lexical analysis is fast, but syntax parsing is slower. For maximum harvesting speed, most current email harvesters use simple lexical analysis and no syntax parsing. They recognize obvious patterns of characters, but they do not look at the context around those characters. With faster computers, spammers can use smarter software to better extract email addresses. When they do, many of today’s email address protection methods will become ineffective.

For example, a regular expression used by a lexical analyzer can easily find an email address like “person@example.com.” Here’s the expression:

/[A-Z0-9_\-\.]+@[A-Z0-9_\-\.]+/gi

For most people this looks like jumbled nonsense, but to a programmer and a regular expression parser, this says “one or more letters or numbers, an @, and one or more letters or numbers”. You can test this yourself, and learn more about regular expressions, by using Rob Locher’s nice Regular Expression Tester.

Email harvesters are already doing this. It is very easy to extend this to look for “person AT example.com,” a masking technique recommended by the U.S. Federal Trade Commission back in 2005. Here’s the expression:

/[A-Z0-9_\-\.]+ *(@|AT) *[A-Z0-9_\-\.]+/gi

Email harvesters are already doing this too, so I’m not giving anything away here.

It is pretty easy to extend this to match any predictable pattern for representing an email address. All of the following can be recognized in a similar way:

person@example.com
person @ example.com
person at example.com
person(at)example.com
person[at]example.com
person&#64;example.com
person&#x40;example.com
person%64example.com
person<!--comment-->@example.com
person<tag></tag>@example.com
person<tag>@</tag>example.com
<tag>person@</tag>example.com

Some email harvesters are already doing this. Slightly more sophisticated lexical analysis can recognize email addresses that use HTML, URL, or JavaScript character codes. And harvesters are doing this now too.

All of this can be done with fast simple lexical analysis. When you add in syntax parsing, you can find more complex patterns where the name and address are more spread out. For instance, syntax parsing can recognize the method I recommended earlier:

User:  person
Domain: example.com

Syntax parsing also can recognize this:

A parser can easily extract rows and columns from a table. Some email harvesters can do this now.

Any predictable pattern can be parsed. It just takes a bit more computer time. And computer time is cheap and getting cheaper. The email address split on to two lines above is only safe today because it isn’t a common enough pattern yet for spammers to have bothered adding it to their harvesters.

Any email masking method based upon a predictable pattern is unsafe. Every numeric character code-based obfuscation method is unsafe. Every method that predictably spells out punctuation or adds spaces or HTML tags is unsafe. Every method that uses a well-known algorithm to scramble or encrypt the letters is unsafe. Every method that predictably spreads out an address onto multiple lines or into table cells is unsafe. If it is predictable, it is parsable by a future email harvester.

What about JavaScript? The JavaScript language was designed to be simple and fast to execute. It is possible to integrate it into an email harvester. The harvester will run slower, but computer time is cheap. So far, none of the harvesters I tested ran web page JavaScript. I expect this to change within the next few years.

Some email harvesters skip writing their own HTML parsers and just plug in to Internet Explorer. Current harvesters grab page text as it is loaded. Future harvesters may grab the text after JavaScript's “onload” functions have run (the way screen readers for the visually impaired do). When they make this change, JavaScript schemes that insert email addresses on page loads will become unsafe.

Unfortunately, almost every method used to mask email addresses also diminishes the usability and accessibility of the web. A real fix to the spam problem must involve stopping the spammers themselves, whether by legal or technical changes to the Internet.

Further reading

Studies

Articles

Comments

email address spam article

Hi,
Thank you so much for this article. I have been searching for ways to protect our online addresses and finally came across yours. My options are limited as our university webmaster has control over most things, but at least I can try some of your more recommended methods.

Your article was comprehensive, accessible and stimulating. I just wish it hadn't taken me so long to find it!

SMC

split lines

I appreciate the comprehensive article. I'd like to have seen what an actual email looks like that utilizes the split line approach. Thank you!

Re: split lines

I'm glad you liked the article. The "split lines" approach is one of many ways to fragment an email address — and one of the most effective. Examples and further discussion are in the companion articles listed at the top of this article. Here's the link for the appropriate article and section: Stop spammer email harvesters by fragmenting email addresses

Thanks very much Dave for

Thanks very much Dave for your quick response! I read your article and even did a search about this. Well, call me a bit hard headed, but I guess what I'm trying to understand is if I were to apply this on my site then would the suggested email appear like this?

User: dean
Domain: example.com

or?

dean
example.com

or?

dean @
example.com

Could any of these work? I'm wanting to use something recognizable and effective instead of or in conjunction with a form. Once again, I appreciate the article, your work and your responsiveness. Thank you!

Re: Thanks very much Dave for

I use the first format above. Your second format should work fine, but I'd avoid the third one. The "@" notifies harvesters that an email address is present, and it isn't hard then to extract the previous and following words, ignoring white-space and HTML tags.

The general idea is to create an address presentation that doesn't match the expectations of a harvester's parser, and yet a human can understand it clearly. Here are a few more ways you might do this:

  • Example.com from Person
  • Person at the site example.com
  • Person using an account at example.com
  • Person receives email at example.com
  • Email me at Person at the site example.com
  • "Person" emailed at "example.com"
  • Account: Person and Site: example.com
  • Account: Person
    Site: example.com

Got it!

Awesome, got it through my hard head!! Thank you for the suggestions as that helps me to more fully understand how it works and appears!

Thanks again!

Introducing Liame

Hi!

I have just released Liame, an email obfuscator for asp.net and other technologies which uses some of the techniques described in this post.

I hope it can help to hide email addresses from spammers.

Regards.

Re: Introducing Liame

I just read your web page on Liame. Very nice. Your scheme to generate a "mailto" link via JavaScript and an unpredictable encoding scheme should work well.

Watch out, though, for what you do for <noscript> tags. Security is only as good as the weakest link. If you offer a non-JavaScript fallback that puts a parsable email address in a <noscript> tag, then harvesters will just get that address instead. And if you don't put any email address there, users without JavaScript enabled won't be able to email you. Current web statistics say that's about 5% of users, and decreasing, so this may not be a big problem.

Anyway, nice job and thanks for doing this and making it available as Open Source.

About Liame

Thank you for your comments.

Regarding the noscript tag, you are right. In fact, Liame by default generates the text "activate script to read" in this tag, so it doesn't show the original address.

Alternatively, Liame can insert the email using some CSS techniques (invisible tags, reverse address) that are considered "secure" against the email harvesters. Anyway, it's up to developer to decide using it or not.

And thanks, again, for testing Liame. :-)

Best regards.

Usability?

As I understand it, you advocate that presenting the user with

Email name: nadeau
At domin name: nadeausoftware.com

as you do on your "About" page requiring the user to copy past these into mail client is more usable than asking them to fill out a contact form? Granted, from the coding side, the split name/domain concept requires no coding knowledge or database access, but I don't see it is more usable to the end user than a simple form.

Also, another method is to use a form that has a field hidden by CSS that is not to be filled in but is labelled with something a spam harvester may be looking for .. say "Subject". The hidden field can be further labelled to say "don't fill me in".

Subject:
Ignore this text box. It is used to detect spammers. If you enter anything into this text box, your message will not be sent.

Note, I may hide this differently in CSS since this would hide the content from screen readers (maybe good/maybe bad) but as an example ...

Then when the form is submitted, any forms that have content in this field may be rejected as spam.

Also, if more protection is needed, the contact form can be two part. 1) Enter data, 2) redisplay data in abn "edit/or submit" form. This would mean the hraveter must get by the hidden form "trap" and then respond the result form as well. Less usable but more spam-free.

Finally, if some sort of captcha is needed, I advocate using a true Turing mechanism. Simple questions taken from a database. Response3s could require simple text response or could be multiple choice. Genarting a large number of these shouldn't be too difficult and more could be added as time goes on . Example:

What is larger, an ant or an elephant?
How many letters in the word red?
What number is smallest: 5 or 7 or 2?

These are not subject to pattern recognition (like captcha and audio captcha).

Re: Usability?

Unfortunately security and usability are always at odds. You can make any system more secure by putting up more barriers to entry. But that makes it less usable. When you build a web user interface where you want users to contribute, you have to favor usability or users won't contribute.

Contact forms
As you suggest, you can make contact forms more secure by adding CAPTCHAs and multiple entry steps, but it annoys users. Your users just want to contact you, not be quizzed. Many adults will be insulted, and some users won't understand the questions, their relevance, or how to answer them. Making up questions which "everybody" can answer, regardless of age, education, language, and social background, is not easy. And if you use a standardized database of questions, then spammers will use the same database to crack your form and post spam.

Also, spammers do form spam by training a spam program to post junk into appropriate fields and automatically "push" submit. Since that training is done by a human, if a human can figure out your form, then a spammer can post spam to it. Adding special "Ignore this text box" fields won't help. Naming fields strangely won't help. Hiding fields with CSS won't help. The only workable solution I know if is to use a spam filter to process all content submitted by forms. I use Akismet, and it works very well.

Email vs. contact forms
Beyond these problems, contact forms are also redundant. Users already know how to send messages to people... by email. Why introduce a different way? The cardinal rule in user interface design is Don't make the user think. Rely on patterns the user already knows so that they don't have to interrupt their train of thought to figure out non-standard buttons, quirky scrollbars, or a different way to send email. The most usable approach is always the one the user already knows.

Additionally, in this world with massive email spam, users always have an email spam filter. One of the key ways a spam filter detects spam is to see if the sender is in the user's address book or it's someone the user has emailed recently. So, if a user contacts you by email, your email address will be in their recent-addresses list. When you respond by email, their filter will let your message through. However, if a user is forced to contact you by a contact form, your email address won't be in their recent-addresses list. When you respond to them by email, your message may get flagged as spam and deleted.

Ultimately, spam is the price we pay for trying to maintain open communications lines with our users.

Oh ... and Thanks!

Oh yes ... I forgot to say thanks for a very informative article and survey!

Re: oh... and Thanks!

:-) Thanks. Glad you liked it and thanks for your earlier posting with contact form ideas.

Idea.

Here's my idea.

Have a fake email address that's visible in the page, and use javascript to REPLACE this bad spamtrap email with the good one.

The beauty of this trick is that the harvester gets an unobfuscated email address, never suspecting that the real email address is elsewhere, while site visitors with javascript enabled never see the bad address, and see the good one instead.

they just click the mailto, and it works. but spammers get a different address that sends everything to the bitbucket. so they don't know to turn on the javascript.

Re: Idea

If you use JavaScript to add a correct email address, users must have JavaScript enabled in order to see it. About 95% of users do, but do you want to annoy the 5% that don't by giving them a bogus email address? A more polite approach might be to leave the email address empty by default.

Also, you aren't tricking a person... you're tricking a harvester tool. The tool doesn't know what to "suspect" about your page. If you have no email addresses there at all, it won't get suspicious, wring its hands, and cast an evil I'm-going-to-get-you glare at your web site. It'll just move on to the next page. So, don't waste time trying to a trick a harvester with a bogus email address.

And don't think a harvester will come back through with JavaScript turned on. They won't. Harvesters are text processing tools, not web browsers, so they don't have JavaScript support. While it is conceivable that a future harvester will support JavaScript, such a harvester will run much much slower than today's harvesters. And since spammers want to extract the most email addresses possible per second, making the harvester slower is a bad thing. If they miss a few addresses because of JavaScript trickery, they'll make up for it by harvesting many more addresses from other pages that don't use trickery.

So, right now, JavaScript email address insertion is a fairly safe approach. But be sure your JavaScript doesn't have the address in a simple text string or the harvester will see it as it scans the JavaScript code.

PDF email-image converter

Hi!
I am trying to find a tool that will automatically search a PDF file, find email addresses and change them to an image in order to stop spammers. Your article entitled "Stop spammer email harvesters by drawing addresses with images or Flash." touches on this, but do you know know of any reliable tools that are specifically tailored to deal with PDF files?
Thanks so much!

Re: PDF email-image converter

Sorry, I have not looked in to such a tool.

Great article

Thank you very much for your article. It is so comprehensive and helpful :)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Nadeau software consulting
Nadeau software consulting