Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer
Tong Wang, Yuan Yao, Feng Xu, Miao Xu, Shengwei An, Ting Wang

TL;DR
This paper introduces DTInspector, a novel backdoor detection method for deep neural networks that leverages the high prediction confidence of poisoned samples, outperforming existing defenses especially against advanced attacks.
Contribution
The paper proposes a new backdoor detection approach based on the observation that effective backdoors require high confidence predictions, enabling detection through prediction ratio analysis.
Findings
Effective against five backdoor attacks
Works across four datasets
Detects advanced attack types
Abstract
Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
