Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
Xiaogeng Liu, Minghui Li, Haoyu Wang, Shengshan Hu, Dengpan Ye, Hai, Jin, Libing Wu, Chaowei Xiao

TL;DR
This paper introduces TeCo, a novel test-time detection method for backdoor triggers in neural networks that relies solely on model outputs and exploits differences in corruption robustness between clean and triggered samples.
Contribution
TeCo is a new detection approach that requires no extra data or trigger knowledge, using corruption robustness consistency to identify backdoor trigger samples during inference.
Findings
TeCo outperforms existing methods with higher AUROC scores.
It demonstrates robustness across various datasets, attacks, and models.
TeCo achieves five times greater stability than state-of-the-art defenses.
Abstract
Deep neural networks are proven to be vulnerable to backdoor attacks. Detecting the trigger samples during the inference stage, i.e., the test-time trigger sample detection, can prevent the backdoor from being triggered. However, existing detection methods often require the defenders to have high accessibility to victim models, extra clean data, or knowledge about the appearance of backdoor triggers, limiting their practicality. In this paper, we propose the test-time corruption robustness consistency evaluation (TeCo), a novel test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information. Our journey begins with the intriguing observation that the backdoor-infected models have similar performance across different image corruptions for the clean images, but perform discrepantly for the trigger samples. Based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
