Stop spammer email harvesters by fragmenting email addresses

A plain email address on a web page is easily found by the email harvesters (spambots) used by spammers. To make it harder to find, split the address into pieces. Separate the pieces with HTML tags or spaces, insert the word “nospam”, replace the “@” with “at”, or put the pieces on separate lines or in separate table cells. The harvester tests reported in this article show that many of these methods work well to stop harvesters.

This article is part of a series on Effective methods to protect email addresses from spammers that compares and tests 50 ways to protect email addresses published on a web site.

How to fragment an email address

The email harvesters (“spam robots” or “spambots”) used by spammers scan the text of web pages looking for plain email addresses like “person@example.com”. Fragmenting an email address splits its into pieces by inserting extra text or by rearranging it a bit. The fragmented address is still readable by your site’s visitors, but many harvesters are confused. And if harvesters can’t find your protected email address, they can’t add it to mailing lists and you’ll get less spam.

Below I discuss each of the most common ways to fragment an email address. After this list, I report the results of running fragmented addresses past a collection of email harvesters to see which methods are effective at protecting your address, and which are not.

Split the address onto separate lines

To protect an email address, split it into two or more parts and place them on separate lines or in separate columns of a table.

Result User: person
Domain: example.com

This works well to protect a single email address, but beware of using it for a large table of addresses, such as a company contact list. Some harvesters can extract a table’s columns of user and domain names and re-assemble them into complete email addresses. This requires that a spammer spend a few minutes configuring the harvester. They are unlikely to take the trouble for only a few addresses, but they may do so for a large contact list.

Add “nospam” within the address

Harvesters extract addresses and add them to a list that they later validate by probing email servers. Invalid email addresses are discarded. To protect an address, publish an invalid form of it. For example, add the word “nospam”, “removeme”, “ABC”, or whatever you like into the middle of the address. Add a comment near the address telling site visitors how to remove the extra word and get a valid address. Harvesters won’t read and understand your comment, so they won’t know how to make the invalid address valid.

Result person-nospam@example.com (remove "-nospam" before use)

A U.S. Federal Trade Commission (FTC) web page Don’t Want Your Email Address Harvested? recommends adding “spamaway” to protect an address, such as “person@spamaway.example.com”. The word you use doesn’t matter. It can be anything you like.

Many spambots have an optional email address filter that can watch for addresses to skip, such as “webmaster” or “support”. Some of these harvesters come pre-configured to skip email addresses containing the word “spam”. However, spammers can (and probably do) disable this email filtering.

If you use a “mailto” link to point to your email address, you’ll need to protect the visible address and the address in the “href” part of the link.

Spell out the punctuation

Email harvesters scan your web pages looking for an “@” character — the words on either side form an email address. Block harvesters by typing “at” instead of “@”, and “dot” instead of “.”. Site visitors will know what you mean.

Result person at example dot com

Common variations on this write “(at)”, “[at]”, “-at-”, “(dot)”, and “[dot]” instead. This approach is very widely used to protect email addresses published in news groups and mailing list archives.

A 2005 U.S. Federal Trade Commission (FTC) report, Email Address Harvesting and the Effectiveness of Anti-Spam Filters (PDF), tested this approach and found that it stopped nearly all spam. But that was in 2005. Today, some spambots can recognize this trick.

Add spaces between the characters

Email addresses cannot contain spaces (technically, they can in the user name part but only with special effort that nobody does). To make your address hard for a harvester to find, insert spaces between the characters. Site visitors will be able to read the protected email address, but harvesters won’t see it.

Result p e r s o n @ e x a m p l e . c o m

Embed an HTML comment

Harvesters are looking for a complete email address, so break it up by inserting an HTML comment within it. The comment is ignored by browsers and invisible to site visitors.

HTML per<!--@junk-->son@example<!--@junk-->.com
Result person@example.com

Bontrager Connection’s Protecting Your Email Address article introduces this method to protect an email address, and then shoots it down by providing a free test page that shows how a harvester could extract the comments and reveal the protected address. Real harvesters, though, may not be as smart as the Bontragers.

Embed an HTML tag

The HTML comment used above might be removed by a clever harvester, revealing the email address. Instead of a comment, add something harder to remove: add an HTML tag such as a <span>. Unlike a comment, an HTML tag is meant to do something, such as format the text, making it harder for a harvester to safely remove it.

An added <span> tag could enclose nothing:

HTML person@<span></span>example.com

or enclose a portion of the email address, such as the “@”:

HTML person<span>@</span>example.com

or enclose junk text that you hide using CSS:

HTML person@<span class="hideme">hideme</span>example.com
CSS .hideme { display: none; }
Result person@hidemeexample.com

Harvesters do not understand page formatting and do not apply CSS styles. In the last example above, harvesters will not understand that the “hideme” text within the protected address is invisible. If they remove the HTML tags, the email address that’s left still includes “hideme”, making the address invalid.

Distribute characters into HTML table cells

As an extended example of fragmenting an email address, you can use HTML to place each character into its own table cell. Style the table without borders or cell spacing so that the characters are tight together and readable.

HTML <table cellspacing="0" border="0"><tr>
<td>p</td><td>e</td><td>r</td><td>s</td><td>o</td><td>n</td><td>@</td>
<td>e</td><td>x</td><td>a</td><td>m</td><td>p</td><td>l</td><td>e</td>
<td>.</td><td>c</td><td>o</td><td>m</td>
</tr></table>
Result
p e r s o n @ e x a m p l e . c o m

Draw the address using a CSS “font”

Instead of filling table cells with characters in an address, leave them empty and instead draw selected cell borders to create the rectangular outlines of characters. Control those borders with CSS. Instead of a table, use nested <div> tags to get more flexibility. The result is a kind of extreme “font” reminiscent of digital displays on consumer electronics gear.

The HTML for this is too long to include here, so here is just the first character (a “p”) and the result for an entire address:

HTML <div class="outer" title="papa">
  <div class="lb"></div>
  <div class="li"></div>
  <div class="lb"></div>
  <div class="le"></div>
  <div class="le"></div>
</div>
Result

The implementation of the font comes from Stu Nicholls, who provides the CSS font at his CSSplay web site. To use his font, view his web page source and copy and paste the characters you need.  Give him credit, of course.

Draw the address using ASCII art

Before images were so easy to create, email, and print, there was ASCII Art. In this digital-age artistic medium, a large table of characters is used to represent the dots (pixels) in a picture. To draw a picture with the table of characters, fill in dark parts of the picture with dark letters, like “M” or “#”, while light parts are filled in with light letters, like “:” or “.””. With the characters at a normal size, the table looks like a mess, but when the characters are shrunk down, a picture emerges.

Here’s a classic one of the Mona Lisa and a close up of the face. This image, and many more, are available at Christopher Johnson’s ASCII Art Collection. Some of them are pretty amazing.

You can do the same thing to protect an email address (how mundane compared to Mona). The HTML for this is way too long to show here, but here’s what it looks like.  While this will work in all web browsers, many of today’s browsers have a minimum font size preference.  This prevents this ASCII art address from being shrunk down as small as it should be to look right.

Result
                                                                                                                                                                                                                                                       
       KF     B3HXWDY                  PK             UT   YVRDM8MYY8YZ     EH         V5KY   UVPT              YX      
     BMD YFW         MX         RY TT8H  W5TB   PUX5 B8FBB WTTXUW UU 85PX   DH DP3 PU MD WBH5 VTM FDV FBNPKZ HE DYWH U8BF HM NNHM B8 EZKP HEED FF3F VM Y3YF 58PP   DUDUHRM 8EKHHM   PTR5KVME DHTMZE5 NW8HETMKM 5U VM8M5XY E3 XKNXWK VK MHU 3XMRMZYK MP  T3 WEUE5YEEH  BVEE5PH 5UFWXP ZRN3KFPN 3W VEFBUW5K   R3YDTHT  FBHHVR5H XBEEYUR 5XDZDK  5K3 HRN YU XV WW HB XH 5D DT V5 KT ZF5 8K 8WD   D5T BW RU 8R WW T5 HM NR 55U TMN RD DTP MV YH 8B DK PP 3XV VB NN MP8 ZVR  T3 FK 8M NT BW KW B8  VT RX 35 Y5 XR B8 DY YM ZT 5Y  3W 5Z K5 VV  ZP KM  DU HM WT U3 BP KE 8W 8R   VX RB KD V8
HR NZ MM 3R ZF WVVP3 FNEXH TP DN   RR Y5 VX  KH ZTFZ5VB3RV MXVD  TMRZY 88 RV MF  UF KM YX  BYRRXK8ZWW UF     FW 8X PU TE  RX 3T EY UP N3 T5   ENKK8MD  DBEV3V3H VZ UE PZ KD BT TK UZBX33THPP   NH EZ8YVXZD BD YT ZW YU  NU  TD HUF5DF5MTP   8M PB UT BE UV UX  3R 3N  5B M3 UU BMW8 XBUT MX H5 T3 HR ZN T ZEE UP    XWMK ERKK TY WU KU NP TD D3 VP E3    DT KF XW DR XU KP HF HB XR W3     PW NZ  MU  YY NB 3T FK NE DU BB  PB YZ PD W5 H8 Y5 NW  33 TK ZB HB EP WY   MD U8 38    RV FH HY 5X
MY RB  U8 DE   XV RB HM  HF VVN EU  RV YZX XY 3XE NXD RF 83 TWR H3 8R YF UXV  KH   VX  N5 YNZ VF NK TR  EXV UF 53E EH DN VD TH XY WK  KU DF PTFPNRYP ZZTHKUU 5KWFK8PMF TT B58VVRK YZ 5ET3WVYFD3K YHXFKBR5 UW UR V8YZKBTH5 EB   KN VK NDTNX538 UE VXXNDYYZ BT   8BY5FVD BKWU8DHX XU   R5 KR RV MY NP X85X HXVZB FYM5H TT 3W  FZD8B   5TX YTNN ZMXW  XM 3KFVN XRX 8UU XPZEF TT M5 KM    E3 5K FX5K  EZ TTD5W VR WMEW8   RKFD    PH MY 8N
            5KN  EF         DR         8XRD  DRW     KK             FHHD3HFY333X            DK              5HXEHNBP     NX                       

Mardeg.sitesled.com has a free web page that generates text like this when you enter an email address. The page is titled “The most bloated human-readable email hider in the world!” Yup.  :-)

Results

I tested 23 widely-available email harvesters to see how well these methods work to protect an email address.  Each harvester was aimed at a test page containing plain and fragmented email addresses. In the table below, a harvester gets a check mark if it recognizes the protected address.

All of the harvesters were tested on Windows XP SP2. The names of the harvesters are intentionally left off to avoid giving this web page search engine attention for spammers looking for the ”best” harvester to download.

Protected email address test results
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Plain email address
Split the address onto separate lines                                              
Add “nospam” within the address – user name 1 1 1 1 3 1   1 3 1 1 1 1 1 1 1 1 1 1 1 1   1
Add “nospam” within the address – domain name 2 2 2 2 2 2   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Spell out the punctuation - “ at ”                                            
Spell out the punctuation - “(at)”                                            
Spell out the punctuation - “[at]”                                            
Add spaces between the characters                             5                
Embed an HTML comment       4                                      
Embed an HTML tag - empty                                            
Embed an HTML tag - around “@”                                            
Embed an HTML tag - hidden text                                              
Distribute characters into HTML table cells                                              
Draw the address using a CSS “font”                                              
Draw the address using ASCII art                                              

Every harvester found the plain email address that was not protected.

Most of the harvesters (cells marked with 1 or 2) found the invalid addresses with “nospam” added. Two harvesters (cells marked with 3) incorrectly truncated the “nospam” address by dropping the text before “nospam”. None of these removed “nospam” to get a valid address.

One spam robot found the email address that replaced “@” with “ at ”, and one spambot found “(at)” and “[at]” addresses.

One harvester (cell marked 5) partially decoded the email address that added a space between each character. It correctly identified the domain name with embedded spaces, but not the user name. This left it with an incomplete and useless address.

One harvester partially removed the HTML comment embedded within an email address (cell marked 4).

One harvester recognized email addresses where an empty HTML tag or a tag surrounding the “@” was embedded within the address. None recognized the protected address with an HTML tag surrounding hidden text.

None of the harvesters recognized the protected address split across table cells or onto two lines. And none understood the CSS font or ASCII art forms of the email addresses.

Conclusions

While most of the harvesters did not find the fragmented addresses, a few harvesters did. Recently released harvesters did better. The more common these protection methods become, the more likely it is that newer harvesters will recognize them.

Spelled-out punctuation (e.g. “at”, “(at)”, or “[at]” for “@”) is very widely used to try to protect addresses in news groups and mailings, but two of the tested harvesters recognized these addresses. One of these was released back in 2004, so this protection method has been bypassed for several years. Despite the FTC’s recommendations a year later in 2005, protecting an email address by replacing “@” with “at”, “(at)” or “[at]” is not an effective way to stop email harvesters.

One of the tested harvesters almost found the address with spaces added between the characters. They may get it right in the next version of the program, and particularly if this becomes a common method. Adding spaces between email address characters is probably not effective. Addresses with embedded spaces also copy and paste badly into email programs and cannot be read normally by screen readers for the visually impaired. Addresses with embedded spaces have poor usability and accessibility.

Most of the tested harvesters found the invalid addresses with “nospam” added. Once harvested, spammers pass these addresses through a separate email “verifier” that probes email servers to confirm that addresses are good. While I did not test email verifiers, it is simple to write a program to automatically strip off commonly-inserted words, such as “nospam” or the FTC’s recommended “spamaway”. Inserting “nospam”, “spamaway”, or any other common phrase into an email address is probably not effective at stopping spammers.

Email address protection methods that add an HTML tag within an address presume that harvesters can’t remove them. So far, only one harvester does. That harvester should have, but did not, recognize the address with HTML comment tags embedded. This could be an artifact of the particular test address. However, it is surprising that there weren’t more harvesters that could strip out HTML tags and comments. This is a simple thing to program and a feature that I expect more harvesters will have soon. Every search engine web spider already has this feature. Embedding an HTML comment or tag into an email address is not effective.

Embedding an HTML tag with text hidden by CSS will probably defeat harvesters for awhile, as will distributing characters into HTML table cells. Even with HTML tag removal, the resulting text is hard to interpret.  Harvesting these addresses will require more sophisticated HTML and CSS handling than spammers are likely to do. However, both methods copy and paste badly and cannot be read properly by screen readers for the visually impaired. Embedding hidden text or splitting an address into table cells is effective, but the results have poor usability and accessibility.

While effective, the CSS “font” and ASCII art methods have poor usability and accessibility. These methods are pretty clever, but they’re cumbersome and the protected email address they draw can’t be copy and pasted or read by a screen reader.

Splitting an email address onto multiple lines is a good solution: it is effective, usable, and accessible. Because it looks like any other multi-line web page text, addresses shown this way are unlikely to be recognized by spam robots. The protected addresses can be read naturally by screen readers and it isn’t difficult for visitors to copy and paste the address (in two steps) into an email program.

Recommendation: splitting an email address onto multiple lines is easy and it works well. There are other methods that also work. They are discussed in the other articles in this series.

Further reading

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Nadeau software consulting
Nadeau software consulting