How do spammers harvest your e-mail address?

Jeff Huang

University of Washington

This project was funded on:
31 May 2012
How did the spammers find me? Billions of people receive billions of spam emails per day. This phenomenon ranges from being annoying to being a large financial cost. We will look at how spammers source the email addresses that they spam.

What are the goals of this project?

As an experienced ex-blogger and an avid internet marketer, I can firmly say that every Internet user with an email address has received spam email more than just once. Spam is not just unsolicited, unapproved contact by a stranger; sometimes spam can lead to loss of money and even theft of identity. As spammers get more sophisticated, it becomes difficult for anyone to differentiate spam emails from genuine emails. Companies all across the world, irrespective of their market, location or size, incur loss of resources, human power and money due to spam. Tech companies, like Google and Yahoo, use about 30 billion watts of electricity (1) - that's enough electricity to power 3 million houses for a year. It's amazing just to think about how much energy and money these companies would save if there was no spam.

Why is this research important?

Why does this research matter to me? Good question. I hate spam. But it's not just the companies that suffer losses from email spam. People like you and me are spending a lot of time reviewing spam, and even with strong spam filters they still have to check their spam folder for misclassified email. Everyone has had the problem where an important email fell into the spam folder.

How will the funds be used?

Our final product will be a paper describing the results of the study. In this experiment, we will post email addresses to the Web - blogs, websites, newsgroups, social networking sites, forums, web services, whois databases, chatrooms, mailing lists, and microblogging sites (twitter, tumblr) to identify which emails are harvested. We'll also submit email addresses for e-greeting cards, insurance quotes, free ipad sites, etc. We plan to investigate multiple forms of obscuring the email (via image, using [at], html escaping, javascript, etc.) to see which methods work best for preventing spammers from harvesting your email.


Budget Overview

My budget will be used to hire a research assistant (either undergrad or Master's) to help administer the experiment. We're aiming for a sample of 1000+ unique email addresses posted on a diverse set of websites/forums/listservs/blogs/social/etc this summer. Additionally we will generate email addresses to sign up for greeting cards, insurance quotes, free ipads, etc. to see which ones send us off-topic spam.

Meet the Researcher


Jeff Huang is a Ph.D. candidate at the Information School at the University of Washington. He conducts research on information retrieval, and has been awarded a research grant from Google, and is a Facebook Fellow. Jeff previously worked at Google, Yahoo, and Microsoft Research.

Relevant publications:

Lueg, C., J. Huang, M.B. Twidale. 2007. Mystery Meat revisited: Spam, Anti-Spam Measures and Digital Redlining Webology, 5(1).

Lueg, C. J. Huang, M.B. Twidale. 2006. Mystery Meat: Where does spam come from, and why does it matter? EICAR, 150–163.

Project Backers

bchestnutThibaut Labarrezamlandduschang27boluandyjkosoccerguy11kkatzenmeyershantwodjkn0xalexvasaudreyranukulrmivannovickdroffigc
Project backers