Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness
Weilin Lin, Li Liu, Shaokui Wei, Jianze Li, Hui Xiong

TL;DR
This paper investigates backdoor vulnerabilities in deep neural networks by analyzing weight changes and neuron activity during unlearning, proposing a two-stage defense method that outperforms existing approaches.
Contribution
It introduces a novel backdoor defense method based on weight change analysis and neuron activeness, without requiring poisoned data, and demonstrates its effectiveness through extensive experiments.
Findings
Weight changes correlate between poison and clean unlearning, enabling backdoor neuron identification.
Backdoored neurons exhibit higher activity, suggesting suppression during fine-tuning.
Proposed method outperforms recent state-of-the-art defenses on multiple datasets.
Abstract
The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (i.e., larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSmart Grid Security and Resilience · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
MethodsPruning
