Stop spammer email harvesters by obfuscating email addresses

The email harvesters (spambots) used by spammers scan your web pages looking for email addresses to add to their spam mailing lists. Obfuscating an address obscures or scrambles its characters, making it harder for a harvester to recognize. The most common method replaces characters with their numeric ASCII character code equivalents. Browsers automatically unobfuscate the address so that site visitors can read it. While this is a popular method to protect an email address, the harvester tests reported in this article show that newer harvesters now recognize many of these addresses.

This article is part of a series on Effective methods to protect email addresses from spammers that compares and tests 50 ways to protect email addresses published on a web site.

How to obfuscate an email address

Publishing an email address on a web page enables site visitors to contact you, but it exposes your address to email harvesters (“spam robots” or “spambots”) that spammers use to scan your site for addresses. Obfuscating (also called “munging” or “encoding”) an address protects it by changing or rearranging the characters, making it harder for a harvester to recognize your address. And if harvesters can”t recognize your protected address, they won’t add it to mailing lists and you’ll get less spam.

The most common way to obfuscate an email address converts (encodes) its characters into numeric ASCII character codes. For example, 97 is the ASCII code for an “a”, 98 for a “b”, and so on. Browsers automatically convert these codes back into characters so that your protected address is readable by site visitors. But many spambots are left confused.

Below I discuss each of the most common ways to obfuscate an email address. After this list, I report the results of running obfuscated addresses past a collection of email harvesters to see which ways are effective at protecting your address, and which are not.

Replace the “@” with a character code

An easy way for harvesters to find an email address on a web page is to look for the “@” character. The text to the left and right of the “@” is the email address. Obfuscate this character by replacing it with its decimal ASCII character code: 64 (here”s a table of ASCII character codes). In HTML, this is written as “@” (a common typo is to forget the semi-colon after the number). If harvesters can’t find the “@”, they won’t find your protected email address.

HTML person@example.com
Result person@example.com

You can use a hexadecimal code instead, and the obfuscated “@” becomes “@” (note the added “x” before the number).

HTML person@example.com
Result person@example.com

Replace the whole address with character codes

If obfuscating the “@” is good, then obfuscating the entire email address may be better. So replace every character with its decimal equivalent:

HTML person@ex
ample.com
Result person@example.com

You can use hexadecimal codes instead:

HTML person@ex
ample.com
Result person@example.com

And you can mix decimal and hexadecimal codes in the same email address.

HTML person@ex
ample.com
Result person@example.com

Andreas Neudecker’s Spam-me-not web page has a free converter that generates the character code version of an email address. A web search (try “email obfuscator”) will find many other converters ("obfuscators", "encoders", "mungers"), but Neudecker’s works well and supports both decimal and hexadecimal codes.

If you want a “mailto” link that points to your email address, you”ll need to protect the visible address and the address inside the link”s “href”. If either one is left unprotected, spambots will find the address and you”ll get spam. Neudecker”s web page will generate the “mailto” link too.

HTML <a href="mailto:&#112;&#101;&#114;&#115;&#111;&#110;&#64;&#101;&#120;
&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;">
&#112;&#101;&#114;&#115;&#111;&#110;&#64;&#101;&#120;
&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;
</a>
Result person@example.com

Replace the address in a mailto link with URL character codes

URLs in the “href” of a link can be encoded in another way. Instead of writing the character’s ASCII number in the HTML style, like “&#64;”, use the URL style that adds a “%” before the number, like “%64” (and no semi-colon at the end).

HTML <a href="mailto:%70%65%72%73%6f%6e%40%65%78%61%6d%70%6c%65%2e%63%6f%6d">
email me</a>
Result email me

The Mysterious Ways web site design company’s Protect email addresses web page has a free converter that generates obfuscated email links like this. A web search will find other similar obfuscated URL encoders.

Use CSS to reverse a backwards email address

Leonardo da Vinci wrote his private notebooks in backwards text. You can use the same idea to protect your email address by writing it backwards, and then using CSS to flip it forwards again when it’s shown on a web page. Since spambots do not understand CSS, they won’t flip the text and get your protected email address.

HTML <span class="reverse">moc.elpmaxe@nosrep</span>
CSS .reverse { unicode-bidi:bidi-override; direction:rtl; }
Result moc.elpmaxe@nosrep

CSS text direction control is intended to support languages that are written from right-to-left, such as Hebrew and Arabic. It is supported by most (but not all) current web browsers. If your browser shows the result above as “person@example.com”, your browser supports text reversal.

Use a <bdo> tag to reverse a backwards email address

Instead of using CSS to flip backwards text, you can use HTML’s <bdo> tag. Use this tag to take a backwards email address and flip it. Since spambots do not understand HTML tags, they won’t know how to flip the text to get your protected email address.

HTML <bdo dir="rtl">moc.elpmaxe@nosrep</bdo>
Result moc.elpmaxe@nosrep

Most (but not all) current browsers support this tag. If your browser shows the result above as “person@example.com”, your browser supports text reversal.

Results

I tested 23 widely-available email harvesters to see how well these methods work to protect an email address.  Each harvester was aimed at a test page containing plain and obfuscated email addresses. In the table below, a harvester gets a check mark if it recognizes the protected address.

All of the harvesters were tested on Windows XP SP2. The names of the harvesters are intentionally left off to avoid giving this web page search engine attention for spammers looking for the ”best” harvester to download.

Protected email address test results
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Plain email address
Replace the “@” with a character code - decimal                                    
Replace the “@” with a character code - hexadecimal                                            
Replace the whole address with character codes - decimal                                      
Replace the whole address with character codes - hexadecimal                                            
Replace the whole address with character codes - mix                                            
Replace the address in a mailto link with URL character codes                                          
Use CSS to reverse a backwards email address       *     *       *                     *  
Use a <bdo> tag to reverse a backwards email address       *     *       *                     *  

As expected, every harvester found the plain email address that was not protected.

Five of the harvesters found email addresses obfuscated by encoding them using decimal ASCII character codes, and one harvester found those using hexadecimal codes. Two of the harvesters found email addresses encoded using URL “%” codes.

Several of the harvesters (cells marked with “*”) found the backwards email addresses, but none of harvesters flipped them back to normal.

Conclusions

Obfuscating an email address by encoding it with ASCII character codes is a popular idea, but it doesn’t work well. A quarter of the tested harvesters found the obfuscated addresses. The most successful spam robot in this test (number 13) was released back in 2003, so this protection method has been bypassed for quite awhile. One harvester download site even offered a free email address obfuscator — which, of course, their harvester could bypass. Obfuscating an email address with ASCII character codes is not an effective way to stop email harvesters.

The backwards email addresses were usually ignored by harvesters because they don’t look valid. But five of the harvesters accepted them anyway. Once harvested, spammers run their addresses past email address “verifiers” that probe email servers to confirm that the addresses are good. If backwards addresses become popular (they aren”t yet), it is easy for a programmer to enhance a verifier to check each address both forwards and backwards. Backwards email addresses are only effective until the method becomes popular.

Backwards email addresses have poor usability and accessibility. While the protected address looks normal in a web browser, if you copy and paste it into an email program it comes out backwards. Also, screen readers used by the visually impaired are confused by the backwards address and speak it backwards.

Recommendation: don’t depend upon obfuscation to protect your email address. Spammers have figured out this trick. Use one of the better protection methods discussed in the other articles in this series.

Further reading

Comments

None of these mechanisms

None of these mechanisms will work.

*Spambots do not honor, let alone even download the css file from a website.
*Character codes are translated in newer spambots.
*The bdo tag only reverses the text for a valid html interpreter! (Only humans see it backwards!). When a spambot downloads the html source, it looks at the source without interpreting it!

Re: None of these mechanisms

Yes and no. Your points are correct, but your conclusion is only partially correct.

To effectively protect an email address we want to deliver a readable address to a human, and an unreadable one to a spambot. Your first point is right: Spambots don't read CSS. This enables us to use CSS tricks to unobfuscate an address in a browser while it remains obfuscated to the spambot. This works.

Your second point is right: Spambots can read numeric character codes, so this mechanism doesn't work.

Your third point is right: Spambots don't reverse text in a bdo tag. This enables us to use it to un-reverse a reversed address in a browser while it remains reversed to the spambot. This works.

So, your conclusion goes too far: two of the methods you highlight do work, and one does not. Nevertheless, I wouldn't recommend using the CSS or bdo tricks since both have usability and accessibility problems and they may not work in all browsers.

There's a tool for Mac users

Mac OS X users can also use a Dashboard widget called obfuscatr. It provides JavaScript or just plain hexadecimal encoding of your email addy. See the details at flash tekkie.

obfuscatr was also featured in MacWorld Italy of March 2008.

Re: There's a tool for Mac users

Well, the point of this article is that hexadecimal encoding doesn't work to stop spammers. So, having an easy Mac tool to do it for you doesn't get you anything except a false sense of security and more spam.

I've run across dozens of hexadecimal encoding tools, and several of them were written by the authors of the email harvesters we're trying to stop. Now, does it seem likely that a harvester author would tell you a correct way to fool their own harvester? Of course not. And as my tests have shown, there are multiple harvesters already available that decode hex encoded addresses. Don't waste your time with obfuscation schemes, no matter who recommends them.

spammers

are there any laws that prevent spammers. I mean if we put these people in jail would this stop it.

Re: spammers

In the US, the 2003 CAN-SPAM requires that unsolicited email not be sent to harvested email addresses, and that it contain a valid header, accurate "from" and "subject" lines, a physical address of the publisher, and an opt-out method. An invalid email header is a misdemeanor. Harvesting is an "aggravated offense."

To date, though, the CAN-SPAM act has had little or no impact, despite some high-profile arrests. There are related laws in other countries, and they too have had little impact. Less than 1% of spam complies with the CAN-SPAM act's header requirements. And some 75% of all email sent is spam.

Most spam is sent by bot nets sending billions of email messages a day through compromised PCs, or "bots". Tracking down the bot nets and shutting down their control networks has been effective recently, but it is not likely to be a long-term solution. It just makes spammers use trickier and trickier bot net control schemes.

Since most spam is sent by compromised PCs, one way to reduce spam is to clean out those PCs. Anti-virus products exist to do this, but they require users and IT departments to use them. And once cleaned, PCs are quickly re-infected through security holes in Windows and IE, or through trojans accidentally installed by PC users themselves.

At this time there is no obvious technical or legal solution to spam control. It is illegal, but finding and catching spammers is very difficult.

Mix 'em up for better results?

Seems to me harvester #13 could handle all the encoded addresses except those encoded with URL Character Codes, and #16 was the only one which could handle non-URL Character Codes, but it couldn't read other encodings. Could you test a COMBINATION of URL Character Codes and other mixed encoding, including mixing in a few plain-text letters. IOW something like [the form decodes what I'm trying to post so I've had to add some spaces to show what I'm trying to suggest]

&# 109; &# 097; il&# 116; &# 111;:myname% 40mydomain.ext

("mailto:myname@mydomain.ext" with "ma" and "to" in "mailto:" encoded in &#, the @ encoded in URL encoding, and the rest in plain text). A harvester would have to be looking at every character individually to get this, and IMHO that would slow them down significantly.

It also appears that you didn't encode "mailto:" in any of your tests. If the harvesters are keying on "mailto:" encoding the "mailto:" might have some value.

Interesting research results

Interesting research results . I find it interesting that spammers are getting increasingly smart, and are not just looking for text surrounding an 'at' sign. One solution i currently use (for the last 4 months) is "Privatedaddy.com". it uses double xor encryption (yes i sniffed the code :-) but is still compatible with browsers that have javascript off. Currently i'm satisfied. it's also open source.

What about Javascript encoding?

What's your opinion of encoders that use Javascript, rather than character encoding? Eg Enkoder http://hivelogic.com/enkoder/app . I'd be interested to see the same tests run on their scheme.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Nadeau software consulting
Nadeau software consulting