All Things Email

About | Contact

Word Stemming to Enhance Spam Filtering

by Shabbir Ahmed, Farzana Mithun

Conference on Email and Anti-Spam, 2004-07-30
Language: English

Note: Published at CEAS 2004.

External links

Full text: PDF

Information about this paper

Abstract

Generally a content based spam filter works on words and phrases of email text and if it finds offensive content it gives that email a numerical value (depending on the content). After crossing a certain threshold, that email may be considered as SPAM. This technique works well only if the offensive words are lexically correct. That means the words must be valid words with correct spelling. Otherwise most content based spam filters will be unable to detect offensive words. In this paper, we showed that if we use some sort of word stemming or word hashing technique that can extract the base or stem of a misspelled or modified word, the efficiency of any content based spam filter can be significantly improved. Here we presented a simple rule -based word stemming algorithm specifically designed for spam detection and showed some experimental results to corroborate our claim.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.