Abstract

The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today's unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails.

Degree

MS

College and Department

Ira A. Fulton College of Engineering and Technology; Technology

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2012-03-13

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd5101

Keywords

email, spam filtering, phish, phishing attacks, support vector machines, maximum entropy, naive bayes, bayesian filters

Share

COinS