All Things Email

About | Contact

Online Spam Filter Fusion

by Gordon V. Cormack, Thomas R. Lynam

2006-08-06
Language: English

Note: SIGIR 2006, August 2006.

External links

Full text: PDF

Information about this paper

Abstract

We show that a set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters. The results of fifty-three spam filters evaluated at the TREC 2005 Spam Track were combined post-hoc so as to simulate the parallel on-line operation of the filters. The combined results were evaluated using the TREC methodology, yielding more than a factor of two improvement over the best filter. The simplest method averaging the binary classifications returned by the individual filters yields a remarkably good result. A new method - averaging log-odds estimates based on the scores returned by the individual filters - yields a somewhat better result, and provides input to SVM- and logistic-regression-based stacking methods. The stacking methods appear to provide further improvement, but only for very large corpora. Of the stacking methods, logistic regression yields the better result. Finally, we show that it is possible to select a priori small subsets of the filters that, when combined, still outperform the best individual filter by a substantial margin.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.