Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks
Chengrui Sun, Hua Zhang, Haoran Gao, Shang Wang, Zian Tian, Jianjin Zhao, Qi Li, Hongliang Zhu, Zongliang Shen, Anmin Fu

TL;DR
This paper introduces Isolate Trigger (IsTr), a novel framework for detecting and mitigating adaptive backdoor attacks in deep learning models, effectively handling complex trigger patterns entangled with benign features.
Contribution
IsTr is the first comprehensive method to detect and eliminate adaptive backdoors entangled with benign features, outperforming existing defenses in accuracy and efficiency.
Findings
Achieves over 95% detection accuracy in various scenarios.
Reduces detection overhead by an order of magnitude.
Maintains low attack success rate below 3% after repair.
Abstract
Deep learning models are widely deployed in various applications but remain vulnerable to stealthy adversarial threats, particularly backdoor attacks. Backdoor models trained on poisoned datasets behave normally with clean inputs but cause mispredictions when a specific trigger is present. Most existing backdoor defenses assume that adversaries only inject one backdoor with small and conspicuous triggers. However, adaptive backdoor that entangle multiple trigger patterns with benign features can effectively bypass existing defenses. To defend against these attacks, we propose Isolate Trigger (IsTr), an accurate and efficient framework for backdoor detection and mitigation. IsTr aims to eliminate the influence of benign features and reverse hidden triggers. IsTr is motivated by the observation that a model's feature extractor focuses more on benign features while its classifier focuses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
