All Things Email

About | Contact

Chung-Kwei: a Pattern-discovery-based System for the Automatic Identification of Unsolicited E-mail Messages (SPAM)

by Isidore Rigoutsos, Tien Huynh

Conference on Email and Anti-Spam, 2004-07-30
Language: English

Note: Published at CEAS 2004.

External links

Full text: PDF

Information about this paper

Abstract

In this paper, we present Chung-Kwei1, a system for the analysis of electronic messages and the automatic identification of unsolicited email messages (=SPAM). The method uses pattern-discovery as its underlying tool and is another instance of a generic approach that has been the basis of previously successful solutions developed by our group to tackle problems in computational biology such as gene finding and protein annotation. Chung- Kwei can be trained very quickly; as new examples of SPAM become available, the system can re-train itself without interrupting the classification of incoming e-mail. We trained Chung-Kwei on a repository of 87,000 messages, then tested it with a very large collection of 88,000 pieces of SPAM and WHITE email: the current prototype achieved a sensitivity of 96.56% whereas the false positive rate was 0.066%, or one-in-six-thousand. In terms of speed, we are currently capable of classifying 214 messages/second, on a 2.2 GHz Intel-Pentium platform. The Chung-Kwei system is part of SpamGuru, a collaborative antispam filtering solution that is currently under development at IBM Research.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.