Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of   Backdoor Effects in Trojaned Machine Learning Models

Rui Zhu; Di Tang; Siyuan Tang; XiaoFeng Wang; Haixu Tang

arXiv:2212.04687·cs.LG·August 13, 2024

Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of Backdoor Effects in Trojaned Machine Learning Models

Rui Zhu, Di Tang, Siyuan Tang, XiaoFeng Wang, Haixu Tang

PDF

Open Access

TL;DR

This paper introduces SEAM, a simple and efficient method to unlearn backdoors in machine learning models by inducing catastrophic forgetting through random labeling, then restoring primary task performance, outperforming existing techniques.

Contribution

The paper proposes SEAM, a novel backdoor unlearning method inspired by catastrophic forgetting and neural tangent kernel analysis, offering high fidelity and speed with minimal data.

Findings

01

SEAM achieves high fidelity in backdoor removal within minutes.

02

SEAM outperforms state-of-the-art unlearning techniques.

03

SEAM requires only 0.1% of training data for effective unlearning.

Abstract

In this paper, we present a simple yet surprisingly effective technique to induce "selective amnesia" on a backdoored model. Our approach, called SEAM, has been inspired by the problem of catastrophic forgetting (CF), a long standing issue in continual learning. Our idea is to retrain a given DNN model on randomly labeled clean data, to induce a CF on the model, leading to a sudden forget on both primary and backdoor tasks; then we recover the primary task by retraining the randomized model on correctly labeled clean data. We analyzed SEAM by modeling the unlearning process as continual learning and further approximating a DNN using Neural Tangent Kernel for measuring CF. Our analysis shows that our random-labeling approach actually maximizes the CF on an unknown backdoor in the absence of triggered inputs, and also preserves some feature extraction in the network to enable a fast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Ferroelectric and Negative Capacitance Devices

MethodsSelf-supervised Equivariant Attention Mechanism