Detecting Backdoor in Deep Neural Networks via Intentional Adversarial   Perturbations

Mingfu Xue; Yinghao Wu; Zhiyu Wu; Yushu Zhang; Jian Wang; Weiqiang Liu

arXiv:2105.14259·cs.CV·May 26, 2023

Detecting Backdoor in Deep Neural Networks via Intentional Adversarial Perturbations

Mingfu Xue, Yinghao Wu, Zhiyu Wu, Yushu Zhang, Jian Wang, Weiqiang Liu

PDF

Open Access

TL;DR

This paper introduces a low-resource, adversarial perturbation-based method for detecting backdoor triggers in deep neural networks, effective during both training and inference stages, with high accuracy and minimal image distortion.

Contribution

The novel approach uses intentional adversarial perturbations to detect backdoor triggers, outperforming existing methods in efficiency and detection accuracy without requiring prior backdoor information.

Findings

01

Detection rate exceeds 99.9% on multiple datasets

02

Maintains high visual quality with low perturbation norms

03

Outperforms existing methods like STRIP in accuracy and efficiency

Abstract

Recent researches show that deep learning model is susceptible to backdoor attacks. Many defenses against backdoor attacks have been proposed. However, existing defense works require high computational overhead or backdoor attack information such as the trigger size, which is difficult to satisfy in realistic scenarios. In this paper, a novel backdoor detection method based on adversarial examples is proposed. The proposed method leverages intentional adversarial perturbations to detect whether an image contains a trigger, which can be applied in both the training stage and the inference stage (sanitize the training set in training stage and detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the image intentionally. If the prediction of the model on the perturbed image is consistent with that on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques