Adversarial Feature Map Pruning for Backdoor
Dong Huang, Qingwen Bu

TL;DR
This paper introduces Adversarial Feature Map Pruning (FMP), a novel method to defend neural networks against backdoor attacks by pruning backdoor-related feature maps, effectively reducing attack success rates while maintaining high model accuracy.
Contribution
FMP is a new defense strategy that prunes backdoor feature maps instead of reproducing triggers, improving robustness against complex and invisible backdoor attacks.
Findings
FMP reduces attack success rate to 2.86% on CIFAR10.
FMP maintains high robust accuracy, e.g., 87.40% on CIFAR10.
FMP outperforms existing defenses against complex triggers.
Abstract
Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attacks, which are achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender cannot reproduce the trigger successfully then the DNN model will not be repaired, as the trigger is not effectively removed. In this work, we propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor from the DNN. Unlike existing defense strategies, which focus on…
Peer Reviews
Decision·ICLR 2024 poster
1. The proposed algorithm is effective for large trigger backdoored models. 2. The proposed algorithm mitigates the backdoored model without the need for reverse engineering the trigger. 3. It has been evaluated on three datasets.
1. The presentation needs improvement as there are many confusing descriptions, referring to the Question section. 2. The three datasets appear to contain a relatively small number of classes. It would be more convincing if the algorithm could be evaluated on more complex datasets, such as ImageNet.
I find this paper interesting. It's important to understand the relationship between pruning and backdoors, and the authors explored this in a systematic way.
While reading the authors' method, it looks similar to Adversarial Neuron Pruning (ANP) by We and Wang (2021). However, the authors don't describe it in the related work, though they compare it with the proposed method in the result section. Discussing the methodological differences between them would help readers understand more.
- The paper provides an extensive evaluation using the standard BackdoorBench benchmark, against multiple attacks and compared to multiple defenses. - The proposed FMT seems to perform well on average. - The source code was provided and is pledged to be available open-source upon paper acceptance. [Update based on authors' response] I would like to thank the authors for their answer and additional results. I think updating the paper based on the discussion would improve it. I have raised my sco
# Novelty and prior work - The novelty of the paper seems limited. The ideas of using adversarial examples to reverse engineer triggers (e.g., [ANP](https://arxiv.org/pdf/2110.14430.pdf), [AEVA](https://openreview.net/forum?id=OM_lYiHXiCL)) or pruning and retraining trigger weights ([RNP](https://proceedings.mlr.press/v202/li23v.html)) are not themselves novel. The paper does not cite most of these very close prior results and does not provide a conceptual comparison to them. - The prior art se
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
MethodsRepair · Focus · Pruning
