Boosting Trees for Anti-Spam Email Filtering

Xavier Carreras; Lluis Marquez

arXiv:cs/0109015·cs.CL·May 23, 2007·343 cites

Boosting Trees for Anti-Spam Email Filtering

Xavier Carreras, Lluis Marquez

PDF

Open Access

TL;DR

This paper evaluates boosting algorithms for email spam filtering, demonstrating their superiority over traditional methods and showing that more complex base learners improve high-precision classification, especially when misclassification costs are high.

Contribution

It provides a comparative analysis of AdaBoost variants with different base learners for spam filtering, highlighting their effectiveness and the benefits of increased learner complexity.

Findings

01

Boosting methods outperform Naive Bayes and decision trees on PU1 corpus.

02

More complex base learners yield better high-precision classifiers.

03

Boosting achieves high F1 scores in spam filtering.

Abstract

This paper describes a set of comparative experiments for the problem of automatically filtering unwanted electronic mail messages. Several variants of the AdaBoost algorithm with confidence-rated predictions [Schapire & Singer, 99] have been applied, which differ in the complexity of the base learners considered. Two main conclusions can be drawn from our experiments: a) The boosting-based methods clearly outperform the baseline learning algorithms (Naive Bayes and Induction of Decision Trees) on the PU1 corpus, achieving very high levels of the F1 measure; b) Increasing the complexity of the base learners allows to obtain better ``high-precision'' classifiers, which is a very important issue when misclassification costs are considered.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Internet Traffic Analysis and Secure E-voting · Network Security and Intrusion Detection