Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning   Weight Changes and Backdoor Activeness

Weilin Lin; Li Liu; Shaokui Wei; Jianze Li; Hui Xiong

arXiv:2405.20291·cs.CR·May 31, 2024·1 cites

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Weilin Lin, Li Liu, Shaokui Wei, Jianze Li, Hui Xiong

PDF

Open Access 1 Video

TL;DR

This paper investigates backdoor vulnerabilities in deep neural networks by analyzing weight changes and neuron activity during unlearning, proposing a two-stage defense method that outperforms existing approaches.

Contribution

It introduces a novel backdoor defense method based on weight change analysis and neuron activeness, without requiring poisoned data, and demonstrates its effectiveness through extensive experiments.

Findings

01

Weight changes correlate between poison and clean unlearning, enabling backdoor neuron identification.

02

Backdoored neurons exhibit higher activity, suggesting suppression during fine-tuning.

03

Proposed method outperforms recent state-of-the-art defenses on multiple datasets.

Abstract

The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (i.e., larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness· slideslive

Taxonomy

TopicsSmart Grid Security and Resilience · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications

MethodsPruning