Privacy Preserving Spam Filtering
Manas A. Pathak, Mehrbod Sharifi, Bhiksha Raj

TL;DR
This paper proposes a privacy-preserving spam filtering system that allows training and evaluating classifiers on encrypted email data without exposing individual emails, using cryptographic primitives and dimensionality reduction techniques.
Contribution
It introduces a novel protocol combining homomorphic encryption and randomization for privacy-preserving spam classification, and demonstrates its practicality with large-scale experiments.
Findings
Achieves high accuracy with privacy-preserving protocols.
Reduces computational complexity via data-independent dimensionality reduction.
Demonstrates feasibility on large-scale spam filtering tasks.
Abstract
Email is a private medium of communication, and the inherent privacy constraints form a major obstacle in developing effective spam filtering methods which require access to a large amount of email data belonging to multiple users. To mitigate this problem, we envision a privacy preserving spam filtering system, where the server is able to train and evaluate a logistic regression based spam classifier on the combined email data of all users without being able to observe any emails using primitives such as homomorphic encryption and randomization. We analyze the protocols for correctness and security, and perform experiments of a prototype system on a large scale spam filtering task. State of the art spam filters often use character n-grams as features which result in large sparse data representation, which is not feasible to be used directly with our training and evaluation protocols.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting
