Tuesday, June 9, 2009

Ever heard of email obfuscation??

I just found out that the institution I (used to) work for has created a contact page for me, exposing my (old-but-still-in-use) professional email address in plain HTML! They just refactored the whole website—for the better—and could have taken a minute to think about email harvesters and their spammer friends. Their mail server is already almost choking to death with unsolicited mail, but they thought it would be a good idea to invite more spam to the party. Tss.

The easy way to advertise your email address on a webpage is to enclose it in an a href HTML tag with a mailto directive, like this:
< a href="mailto:your.login@your.domain-name.com" >Send me spam!< \a > (without the non-breaking spaces)
The syntax is fairly simple. Check out this site at the University of Nebraska (among others) if you'd like to see how you can fill out the subject field of the message or specify multiple recipients (more people to send spam to, yay!).

This is all very nice, but it means a very simple crawler can open the page to suck out your email address and have some fun with it. Note that the site I just pointed you to has the following recommendation:
"It is recommended that you use a process other than MailTo [to] handle the e-mail process from your web site." [quoted from here]
The process the site is mentioning is a way to display your email address without having it exposed in plain HTML and lying around for everyone to see. This process is called obfuscation. Of course, crawlers will eventually learn how to read through the obfuscating code and run away with your email address, but why give out the info they're looking for right away when you can keep it protected for a little longer?

There are lots of clever ways to obfuscate an email address. From what I've seen, there are three types of approaches.
  • Some advocate encoding the content of the HTML tag, using Unicode code points (m would be U+006D), for instance, or numeric character references (m = &# 109;). This solution is certainly a good deterrent for the human eye, but I doubt a bot would have much trouble figuring out how to read the string. (Incidentally, the email address encoder at the University of Nebraska is called Spam-me-not!)
  • The second solution implies using a script to scatter the information needed to reconstruct your email address dynamically (this information can be encoded too!). Here again, lots of people are publishing their own solution, but I thought this one was particularly interesting (simple and readable). However, even if harvesters can't process javascript, they can try their luck at assembling the bits of information contained in the script and see if they obtain a valid email address...
  • The best solution, then, is probably a site-wide rewrite of email addresses, like the one Roel Van Gils proposes on A List Apart (scroll down to the Putting it together section).
What do you think? How far should webmasters go to protect the email addresses you can find on their websites?

2 comments:

  1. Easiest way I found is to convert your email address in an image, take your favorite image editor, open a blank[insert the background color of your webpage] canvas, write your email address and save it as a PNG.
    Insert the png where you want you email address to appear.

    ReplyDelete
  2. Yes, that's exactly what I did on my old home page. One (minor) down side is that people need to type in your address if they want to send you a message (but maybe you have a special trick to share?).

    ReplyDelete