fuzzy set model, similarity measures, phrase matching, information retrieval


Emails are unquestionably one of the most popular communication media these days. Not only they are fast and reliable, but also free in general. Unfortunately, a significant number of emails received by email users on a daily basis are spam. This fact is annoying, since spam emails translate into a waste of user’s time in reviewing and deleting them. In addition, spam emails consume resources, such as storage, bandwidth, and computer processing time. Many attempts have been made in the past to eradicate spam emails; however, none has been proved highly effective. In this paper, we propose a spam-email detection approach, called SpamED, which uses the similarity of phrases in emails to detect spam. Conducted experiments not only verify that SpamED using trigrams in emails is capable of minimizing false positives and false negatives in spam detection, but also it outperforms a number of existing email filtering approaches with a 96% accuracy rate.

Original Publication Citation

Maria Soledad Pera and Yiu-Kai Ng. "SpamED: A Spam Email Detection Approach Based on Phrase Similarity." Journal of the American Society for Information Science and Technology (JASIST), Volume 6, Issue 2, pp. 393-49, February 29, Wiley.

Document Type

Peer-Reviewed Article

Publication Date


Permanent URL






Physical and Mathematical Sciences


Computer Science