Detecting Spammers via Aggregated Historical Data Set

Eitan Menahem; Rami Puzis

arXiv:1205.1357·cs.CR·May 8, 2012·1 cites

Detecting Spammers via Aggregated Historical Data Set

Eitan Menahem, Rami Puzis

PDF

Open Access

TL;DR

This paper introduces a machine learning-based sender reputation system using aggregated historical email data, significantly improving spam detection accuracy and reducing computational load for large email providers.

Contribution

The paper presents a novel reputation mechanism leveraging historical data sets and machine learning, outperforming previous methods in spam detection and efficiency.

Findings

01

Detects over 94% of spam emails missed by blacklists

02

Maintains less than 0.5% false positive rate

03

Reduces email filtering computational load by 80%

Abstract

The battle between email service providers and senders of mass unsolicited emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent mainly from automatic botnets distributed over the world. One method for mitigating Spam in a computationally efficient manner is fast and accurate blacklisting of the senders. In this work we propose a new sender reputation mechanism that is based on an aggregated historical data-set which encodes the behavior of mail transfer agents over time. A historical data-set is created from labeled logs of received emails. We use machine learning algorithms to build a model that predicts the \emph{spammingness} of mail transfer agents in the near future. The proposed mechanism is targeted mainly at large enterprises and email service providers and can be used for updating both the black and the white lists. We evaluate the proposed mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting