All Things Email

About | Contact

The Spam Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It

by William S. Yerazunis

MIT Spam Conference, 2005-01-21
Language: English

Note: Presented at the MIT Spam Conference on January, 18th 2004 in Cambridge, MA, USA

External links

Full text: PDF

Information about this paper

Abstract

Bayesian filters have now become the standard for spam filtering; unfortunately most Bayesian filters seem to reach a plateau of accuracy at 99.9%. We experimentally compare the training methods TEFT, TOE, and TUNE, as well as pure Bayesian, token-bag, token-sequence, SBPH, and Markovian discriminators. The results demonstrate that TUNE is indeed best for training, but computationally exorbitant, and that Markovian discrimination is considerably more accurate than Bayesian, but not sufficient to reach four-nines accuracy, and that other techniques such as inoculation are needed.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.