How do spammers harvest your e-mail address?

University of Washington
Computer Science
Tax Deductible
$5,191
Raised
576%
Funded on 5/31/12
Successfully Funded
  • $5,191
    pledged
  • 576%
    funded
  • Funded
    on 5/31/12

About This Project

How did the spammers find me? Billions of people receive billions of spam emails per day. This phenomenon ranges from being annoying to being a large financial cost. We will look at how spammers source the email addresses that they spam.

Ask the Scientists

Join The Discussion 〉

What is the context of this research?

As an experienced ex-blogger and an avid internet marketer, I can firmly say that every Internet user with an email address has received spam email more than just once. Spam is not just unsolicited, unapproved contact by a stranger; sometimes spam can lead to loss of money and even theft of identity. As spammers get more sophisticated, it becomes difficult for anyone to differentiate spam emails from genuine emails. Companies all across the world, irrespective of their market, location or size, incur loss of resources, human power and money due to spam. Tech companies, like Google and Yahoo, use about 30 billion watts of electricity (1) - that's enough electricity to power 3 million houses for a year. It's amazing just to think about how much energy and money these companies would save if there was no spam.

What is the significance of this project?

Why does this research matter to me? Good question. I hate spam. But it's not just the companies that suffer losses from email spam. People like you and me are spending a lot of time reviewing spam, and even with strong spam filters they still have to check their spam folder for misclassified email. Everyone has had the problem where an important email fell into the spam folder.

What are the goals of the project?

Our final product will be a paper describing the results of the study. In this experiment, we will post email addresses to the Web - blogs, websites, newsgroups, social networking sites, forums, web services, whois databases, chatrooms, mailing lists, and microblogging sites (twitter, tumblr) to identify which emails are harvested. We'll also submit email addresses for e-greeting cards, insurance quotes, free ipad sites, etc. We plan to investigate multiple forms of obscuring the email (via image, using [at], html escaping, javascript, etc.) to see which methods work best for preventing spammers from harvesting your email.

Budget

  • $960Research Intern

My budget will be used to hire a research assistant (either undergrad or Master's) to help administer the experiment. We're aiming for a sample of 1000+ unique email addresses posted on a diverse set of websites/forums/listservs/blogs/social/etc this summer. Additionally we will generate email addresses to sign up for greeting cards, insurance quotes, free ipads, etc. to see which ones send us off-topic spam.

Meet the Team

Jeff Huang
Jeff Huang
Ph.D. candidate

Affiliates

Ph.D. candidate, Information School, University of Washington MS, Computer Science, University of Illinois BS, Computer Science, University of Illinois

Background

Jeff Huang is a Ph.D. candidate at the Information School at the University of Washington. He conducts research on information retrieval, and has been awarded a research grant from Google, and is a Facebook Fellow. Jeff previously worked at Google, Yahoo, and Microsoft Research.

Additional Information

Relevant publications:

Lueg, C., J. Huang, M.B. Twidale. 2007. Mystery Meat revisited: Spam, Anti-Spam Measures and Digital Redlining Webology, 5(1).

Lueg, C. J. Huang, M.B. Twidale. 2006. Mystery Meat: Where does spam come from, and why does it matter? EICAR, 150–163.