Detecting Backdoor in Deep Neural Networks via Intentional Adversarial Perturbations
Mingfu Xue, Yinghao Wu, Zhiyu Wu, Yushu Zhang, Jian Wang, Weiqiang Liu

TL;DR
This paper introduces a low-resource, adversarial perturbation-based method for detecting backdoor triggers in deep neural networks, effective during both training and inference stages, with high accuracy and minimal image distortion.
Contribution
The novel approach uses intentional adversarial perturbations to detect backdoor triggers, outperforming existing methods in efficiency and detection accuracy without requiring prior backdoor information.
Findings
Detection rate exceeds 99.9% on multiple datasets
Maintains high visual quality with low perturbation norms
Outperforms existing methods like STRIP in accuracy and efficiency
Abstract
Recent researches show that deep learning model is susceptible to backdoor attacks. Many defenses against backdoor attacks have been proposed. However, existing defense works require high computational overhead or backdoor attack information such as the trigger size, which is difficult to satisfy in realistic scenarios. In this paper, a novel backdoor detection method based on adversarial examples is proposed. The proposed method leverages intentional adversarial perturbations to detect whether an image contains a trigger, which can be applied in both the training stage and the inference stage (sanitize the training set in training stage and detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the image intentionally. If the prediction of the model on the perturbed image is consistent with that on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
