Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection
Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

TL;DR
This paper introduces a data-efficient black-box backdoor detection method for deep neural networks that uses five metrics and novelty detectors to identify poisoned inputs, demonstrating effectiveness across various attacks.
Contribution
It presents a novel approach using five metrics and a meta detector for backdoor detection with minimal clean data, adaptable to future attack types.
Findings
Effective detection across multiple backdoor attack types
Utilizes a small clean validation dataset for training detectors
Can be incrementally improved with additional metrics
Abstract
This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
