Keywords

fuzzy set model, similarity measures, phrase matching, information retrieval

Abstract

Emails are unquestionably one of the most popular communication media these days. Not only they are fast and reliable, but also free in general. Unfortunately, a significant number of emails received by email users on a daily basis are spam. This fact is annoying, since spam emails translate into a waste of user’s time in reviewing and deleting them. In addition, spam emails consume resources, such as storage, bandwidth, and computer processing time. Many attempts have been made in the past to eradicate spam emails; however, none has been proved highly effective. In this paper, we propose a spam-email detection approach, called SpamED, which uses the similarity of phrases in emails to detect spam. Conducted experiments not only verify that SpamED using trigrams in emails is capable of minimizing false positives and false negatives in spam detection, but also it outperforms a number of existing email filtering approaches with a 96% accuracy rate.

Original Publication Citation

Maria Soledad Pera and Yiu-Kai Ng. "SpamED: A Spam Email Detection Approach Based on Phrase Similarity." Journal of the American Society for Information Science and Technology (JASIST), Volume 6, Issue 2, pp. 393-49, February 29, Wiley.

Document Type

Peer-Reviewed Article

Publication Date

2009-02-01

Permanent URL

http://hdl.lib.byu.edu/1877/2637

Publisher

Wiley

Language

English

College

Physical and Mathematical Sciences

Department

Computer Science

Share

COinS