Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of Backdoor Effects in Trojaned Machine Learning Models
Rui Zhu, Di Tang, Siyuan Tang, XiaoFeng Wang, Haixu Tang

TL;DR
This paper introduces SEAM, a simple and efficient method to unlearn backdoors in machine learning models by inducing catastrophic forgetting through random labeling, then restoring primary task performance, outperforming existing techniques.
Contribution
The paper proposes SEAM, a novel backdoor unlearning method inspired by catastrophic forgetting and neural tangent kernel analysis, offering high fidelity and speed with minimal data.
Findings
SEAM achieves high fidelity in backdoor removal within minutes.
SEAM outperforms state-of-the-art unlearning techniques.
SEAM requires only 0.1% of training data for effective unlearning.
Abstract
In this paper, we present a simple yet surprisingly effective technique to induce "selective amnesia" on a backdoored model. Our approach, called SEAM, has been inspired by the problem of catastrophic forgetting (CF), a long standing issue in continual learning. Our idea is to retrain a given DNN model on randomly labeled clean data, to induce a CF on the model, leading to a sudden forget on both primary and backdoor tasks; then we recover the primary task by retraining the randomized model on correctly labeled clean data. We analyzed SEAM by modeling the unlearning process as continual learning and further approximating a DNN using Neural Tangent Kernel for measuring CF. Our analysis shows that our random-labeling approach actually maximizes the CF on an unknown backdoor in the absence of triggered inputs, and also preserves some feature extraction in the network to enable a fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Ferroelectric and Negative Capacitance Devices
MethodsSelf-supervised Equivariant Attention Mechanism
