Uncovering, Explaining, and Mitigating the Superficial Safety of   Backdoor Defense

Rui Min; Zeyu Qin; Nevin L. Zhang; Li Shen; Minhao Cheng

arXiv:2410.09838·cs.LG·October 17, 2024

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Rui Min, Zeyu Qin, Nevin L. Zhang, Li Shen, Minhao Cheng

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper investigates the robustness of backdoor defenses in neural networks, revealing vulnerabilities to rapid re-learning and proposing a new tuning method, PAM, to enhance post-purification safety without sacrificing accuracy.

Contribution

It uncovers limitations of current backdoor purification methods and introduces Path-Aware Minimization (PAM) to improve robustness against reactivation attacks.

Findings

01

Current purification methods are vulnerable to re-learning backdoors.

02

Query-based Reactivation Attack (QRA) can effectively reactivate backdoors.

03

PAM significantly improves robustness while maintaining accuracy.

Abstract

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. Typically, these purified models exhibit low Attack Success Rates (ASR), rendering them resistant to backdoored inputs. However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense· slideslive

Taxonomy

TopicsCombustion and Detonation Processes · Safety Systems Engineering in Autonomy · Risk and Safety Analysis