Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu,, Zhangyang Wang

TL;DR
This paper introduces a novel method for detecting Trojan attacks in deep neural networks by leveraging network sparsity, identifying a sparse subnetwork that retains Trojan features while being benign on clean inputs.
Contribution
The paper proposes a new Trojan detection approach based on network pruning and the lottery ticket hypothesis, effective even without clean training data.
Findings
Effective detection across multiple datasets and architectures
Trojan features are more stable to pruning than benign features
Achieves high accuracy in isolating Trojan information
Abstract
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)
MethodsPruning
