Spam filtering by quantitative profiles

M. Grend\'ar; J. \v{S}kutov\'a; V. \v{S}pitalsk\'y

arXiv:1201.0040·cs.IR·January 4, 2012·1 cites

Spam filtering by quantitative profiles

M. Grend\'ar, J. \v{S}kutov\'a, V. \v{S}pitalsk\'y

PDF

Open Access

TL;DR

This paper introduces a quantitative profile method for spam filtering, representing emails as fixed-dimensional numerical vectors, achieving comparable performance to traditional methods at low computational costs.

Contribution

It proposes a novel quantitative profile approach using line and character profiles, offering an alternative to bag-of-words for spam filtering.

Findings

01

Quantitative profiles perform comparably to heuristic rules and naive Bayes.

02

The method is computationally efficient.

03

Evaluations on multiple datasets demonstrate effectiveness.

Abstract

Instead of the 'bag-of-words' representation, in the quantitative profile approach to spam filtering and email categorization, an email is represented by an m-dimensional vector of numbers, with m fixed in advance. Inspired by Sroufe et al. [Sroufe, P., Phithakkitnukoon, S., Dantu, R., and Cangussu, J. (2010). Email shape analysis. In \emph{LNCS}, 5935, pp. 18-29] two instances of quantitative profiles are considered: line profile and character profile. Performance of these profiles is studied on the TREC 2007, CEAS 2008 and a private corpuses. At low computational costs, the two quantitative profiles achieve performance that is at least comparable to that of heuristic rules and naive Bayes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Text and Document Classification Technologies