An Investigation on Learning, Polluting, and Unlearning the Spam Emails for Lifelong Learning
Nishchal Parne, Kyathi Puppaala, Nithish Bhupathi, Ripon Patgiri

TL;DR
This paper explores machine unlearning for spam email detection models to efficiently remove polluted data, demonstrating that unlearning is faster and more practical than retraining, thus enhancing model security and robustness.
Contribution
It introduces an unlearning framework integrated into Naive Bayes, Decision Trees, and Random Forest spam detectors, showing its effectiveness over retraining in handling data pollution.
Findings
Unlearning restores model accuracy after data pollution.
Unlearning is faster than retraining across models.
Unlearning effectively mitigates pollution impact.
Abstract
Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1%. This problem can be solved by developing unlearning frameworks for all spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Text and Document Classification Technologies
