NESG

Icono Icono

Icono Icono

Segmental Parameterisation and Statistical Modelling of E-mail Headers for Spam Detection

Francisco Javier Salcedo Campos; Jesús Esteban Díaz Verdejo; Pedro García-Teodoro
Abstract:
‘Spammers exploit the popularity and low cost of e-mail services to send unsolicited messages (spam), which fill users’ accounts and waste valuable resources. To combat this problem, many different spam filtering techniques have been proposed in the literature. Nevertheless, most current anti-spamming filtering schemes are based on detecting relevant terms or tokens in the entire message or in only the body, which implies an invasion of users’ privacy. In this paper, a novel spam-filtering technique based solely on the information present in headers is introduced. In this approach, headers are considered as the result of a dynamic process that generates characters. The observed characters are treated as signals and parameterised in accordance with standard signal pre-processing techniques by extracting relevant parameters from the header. From this, Hidden Markov Models (HMMs) are considered for a spam detection system. The performance achieved by our proposal is evaluated and compared with that of other pattern classification paradigms used for spam filtering. The experimental results for SpamAssassin, TREC05 and CEAS 2008 Lab Evaluation improve on those results obtained with other widely used techniques, achieving up to 98.42% of spam detection while keeping the false positive rate below 0.4% and with the added advantages of using only information from the headers and being independent of the language in which the e-mail is written.
Research areas:
Year:
2012
Type of Publication:
Article
Keywords:
Spam detection,Hidden Markov Model,Mail header,Histogram
Journal:
Information Sciences
Volume:
195
Number:
45-61
ISSN:
0020-0255
DOI:
10.1016/j.ins.2012.01.022
Hits: 2115