All Things Email

About | Contact

TREC 2005 Spam Track Overview

by Gordon V. Cormack, Thomas R. Lynam

2005
Language: English

Note: Proc. TREC 2005 - the Fourteenth Text REtrieval Conference, Gaithersburg, 2005.

External links

Full text: PDF

Information about this paper

Abstract

TREC's Spam Track introduces a standard testing framework that presents a chronological sequence of email messages, one at a time, to a spam filter for classification. The filter yields a binary judgement (spam or ham [i.e. non-spam]) which is compared to a human-adjudicated gold standard. The filter also yields a spamminess score, intended to reflect the likelihood that the classified message is spam, which is the subject of post-hoc ROC (Receiver Operating Characteristic) analysis. The gold standard for each message is communicated to the filter immediately following classification. Eight test corpora - email messages plus gold standard judgements - were used to evaluate 53 subject filters. Five of the corpora (the public corpora) were distributed to participants, who ran their filters on the corpora using a track-supplied toolkit implementing the framework. Three of the corpora (the private corpora) were not distributed to participants; rather, participants submitted filter implementations that were run, using the toolkit, on the private data. Twelve groups participated in the track, submitting 44 filters for evaluation. The other nine subject filters were variants of popular open-source implementations adapted for use in the toolkit in consultation with their authors.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.