An Investigation on Learning, Polluting, and Unlearning the Spam Emails   for Lifelong Learning

Nishchal Parne; Kyathi Puppaala; Nithish Bhupathi; Ripon Patgiri

arXiv:2111.14609·cs.LG·December 28, 2021·1 cites

An Investigation on Learning, Polluting, and Unlearning the Spam Emails for Lifelong Learning

Nishchal Parne, Kyathi Puppaala, Nithish Bhupathi, Ripon Patgiri

PDF

Open Access

TL;DR

This paper explores machine unlearning for spam email detection models to efficiently remove polluted data, demonstrating that unlearning is faster and more practical than retraining, thus enhancing model security and robustness.

Contribution

It introduces an unlearning framework integrated into Naive Bayes, Decision Trees, and Random Forest spam detectors, showing its effectiveness over retraining in handling data pollution.

Findings

01

Unlearning restores model accuracy after data pollution.

02

Unlearning is faster than retraining across models.

03

Unlearning effectively mitigates pollution impact.

Abstract

Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1%. This problem can be solved by developing unlearning frameworks for all spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Text and Document Classification Technologies