How do spammers harvest your e-mail address?

By Jeff Huang

Backed by Ben Chestnut, Alex Vaschillo, Mike Longwell, Anukul Veeraraghavan, Audrey Roy, Dustin Chang, Dan Knox, Thibaut Labarre, Kristy Katzenmeyer, Bo Lu, and 5 other backers

University of Washington

Sammamish, Washington

Tax Deductible

DOI: 10.18258/0009

$5,191

Raised of $900 Goal

576%

Funded on 5/31/12

Successfully Funded

$5,191
pledged
576%
funded
Funded
on 5/31/12

?How does this work?

About This Project

How did the spammers find me? Billions of people receive billions of spam emails per day. This phenomenon ranges from being annoying to being a large financial cost. We will look at how spammers source the email addresses that they spam.

Ask the Scientists

Join The Discussion

What is the context of this research?

As an experienced ex-blogger and an avid internet marketer, I can firmly say that every Internet user with an email address has received spam email more than just once. Spam is not just unsolicited, unapproved contact by a stranger; sometimes spam can lead to loss of money and even theft of identity. As spammers get more sophisticated, it becomes difficult for anyone to differentiate spam emails from genuine emails. Companies all across the world, irrespective of their market, location or size, incur loss of resources, human power and money due to spam. Tech companies, like Google and Yahoo, use about 30 billion watts of electricity (1) - that's enough electricity to power 3 million houses for a year. It's amazing just to think about how much energy and money these companies would save if there was no spam.

What is the significance of this project?

Why does this research matter to me? Good question. I hate spam. But it's not just the companies that suffer losses from email spam. People like you and me are spending a lot of time reviewing spam, and even with strong spam filters they still have to check their spam folder for misclassified email. Everyone has had the problem where an important email fell into the spam folder.

What are the goals of the project?

Our final product will be a paper describing the results of the study. In this experiment, we will post email addresses to the Web - blogs, websites, newsgroups, social networking sites, forums, web services, whois databases, chatrooms, mailing lists, and microblogging sites (twitter, tumblr) to identify which emails are harvested. We'll also submit email addresses for e-greeting cards, insurance quotes, free ipad sites, etc. We plan to investigate multiple forms of obscuring the email (via image, using [at], html escaping, javascript, etc.) to see which methods work best for preventing spammers from harvesting your email.

Budget

Please wait...

My budget will be used to hire a research assistant (either undergrad or Master's) to help administer the experiment. We're aiming for a sample of 1000+ unique email addresses posted on a diverse set of websites/forums/listservs/blogs/social/etc this summer. Additionally we will generate email addresses to sign up for greeting cards, insurance quotes, free ipads, etc. to see which ones send us off-topic spam.

Meet the Team

Jeff Huang

Ph.D. candidate

Affiliates

Ph.D. candidate, Information School, University of Washington MS, Computer Science, University of Illinois BS, Computer Science, University of Illinois

View Profile

Team Bio

Jeff Huang

Jeff Huang is a Ph.D. candidate at the Information School at the University of Washington. He conducts research on information retrieval, and has been awarded a research grant from Google, and is a Facebook Fellow. Jeff previously worked at Google, Yahoo, and Microsoft Research.

Update: Jeff is now an assistant professor in computer science at Brown University. He runs the Human-Computer Interaction Research Group.

Lab Notes

10 Lab Notes Posted

This lab note is
for backers only

Update #10: Understanding the obfuscation techniques used!

March 27, 2013

This lab note is
for backers only

Update #9: Some charts and finding

January 11, 2013

This lab note is
for backers only

Update #8: Behind the scenes

November 11, 2012

Additional Information

Relevant publications:

Lueg, C., J. Huang, M.B. Twidale. 2007. Mystery Meat revisited: Spam, Anti-Spam Measures and Digital Redlining Webology, 5(1).

Lueg, C. J. Huang, M.B. Twidale. 2006. Mystery Meat: Where does spam come from, and why does it matter? EICAR, 150–163.

Project Backers

15Backers
576%Funded
$5,191Total Donations
$346.07Average Donation

Please wait...