Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks

Chengrui Sun; Hua Zhang; Haoran Gao; Shang Wang; Zian Tian; Jianjin Zhao; Qi Li; Hongliang Zhu; Zongliang Shen; Anmin Fu

arXiv:2508.04094·cs.CR·November 18, 2025

Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks

Chengrui Sun, Hua Zhang, Haoran Gao, Shang Wang, Zian Tian, Jianjin Zhao, Qi Li, Hongliang Zhu, Zongliang Shen, Anmin Fu

PDF

Open Access

TL;DR

This paper introduces Isolate Trigger (IsTr), a novel framework for detecting and mitigating adaptive backdoor attacks in deep learning models, effectively handling complex trigger patterns entangled with benign features.

Contribution

IsTr is the first comprehensive method to detect and eliminate adaptive backdoors entangled with benign features, outperforming existing defenses in accuracy and efficiency.

Findings

01

Achieves over 95% detection accuracy in various scenarios.

02

Reduces detection overhead by an order of magnitude.

03

Maintains low attack success rate below 3% after repair.

Abstract

Deep learning models are widely deployed in various applications but remain vulnerable to stealthy adversarial threats, particularly backdoor attacks. Backdoor models trained on poisoned datasets behave normally with clean inputs but cause mispredictions when a specific trigger is present. Most existing backdoor defenses assume that adversaries only inject one backdoor with small and conspicuous triggers. However, adaptive backdoor that entangle multiple trigger patterns with benign features can effectively bypass existing defenses. To defend against these attacks, we propose Isolate Trigger (IsTr), an accurate and efficient framework for backdoor detection and mitigation. IsTr aims to eliminate the influence of benign features and reverse hidden triggers. IsTr is motivated by the observation that a model's feature extractor focuses more on benign features while its classifier focuses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security