Backdoor Defense via Suppressing Model Shortcuts
Sheng Yang, Yiming Li, Yong Jiang, Shu-Tao Xia

TL;DR
This paper proposes a novel backdoor defense method that suppresses key skip connections in neural networks to reduce attack success rates while maintaining accuracy, validated through extensive experiments.
Contribution
It introduces a new defense approach targeting model shortcuts by suppressing skip connections, improving backdoor removal effectiveness.
Findings
Significant decrease in attack success rate when suppressing skip connections.
Effective backdoor removal with minimal impact on benign accuracy.
Validated on benchmark datasets with extensive experiments.
Abstract
Recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks during the training process. Specifically, the adversaries intend to embed hidden backdoors in DNNs so that malicious model predictions can be activated through pre-defined trigger patterns. In this paper, we explore the backdoor mechanism from the angle of the model structure. We select the skip connection for discussions, inspired by the understanding that it helps the learning of model `shortcuts' where backdoor triggers are usually easier to be learned. Specifically, we demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections. Based on this observation, we design a simple yet effective backdoor removal method by suppressing the skip connections in critical layers selected by our method. We also implement fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
