All Things Email

About | Contact

Trends in Spam Products and Methods

by Geoff Hulton, Anthony Penta, Gopalakrishnan Seshadrinathan, Manav Mishra

Conference on Email and Anti-Spam, 2004-07-30
Language: English

Note: Published at CEAS 2004.

External links

Full text: PDF

Information about this paper

Abstract

In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is being collected and then discuss how both the products being advertised by spam and the specific exploits being used to avoid spam filters have changed over time. Every day we randomly select one message from the mail stream of each Hotmail volunteer and ask that user to classify it for us. Thanks to these users, we have been receiving tens of thousands of hand classified messages per day, every day for the past year - our database currently contains over ten million classified messages. In this paper we further analyze two samples of the spam from this data, one from early 2003, and one from early 2004. We categorized the spam by the type of product it is selling, and by the types of exploits it uses to avoid spam filters. We are aware of very few other large scale studies of spam. One is the FTC report on false claims in spam [1]. Our study differs by using data sets that were created by randomly sampling over the entire mail stream, rather than by relying on users to report e-mail that offended them; by reporting changes in spam data over time; and by reporting on more categories of spammer exploits. Another relevant large scale study is our analysis of the geographic origins of spam.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.