The Spam Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It
MIT Spam Conference,
2005-01-21
Language:
English
Note: Presented at the MIT Spam Conference on January, 18th 2004 in Cambridge, MA, USA
Abstract
Bayesian filters have now become the standard for spam filtering; unfortunately most Bayesian filters seem to reach a plateau of accuracy at 99.9%. We experimentally compare the training methods TEFT, TOE, and TUNE, as well as pure Bayesian, token-bag, token-sequence, SBPH, and Markovian discriminators. The results demonstrate that TUNE is indeed best for training, but computationally exorbitant, and that Markovian discrimination is considerably more accurate than Bayesian, but not sufficient to reach four-nines accuracy, and that other techniques such as inoculation are needed.
