Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Tianlong Chen; Zhenyu Zhang; Yihua Zhang; Shiyu Chang; Sijia Liu,; Zhangyang Wang

arXiv:2205.11819·cs.LG·May 25, 2022·1 cites

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu,, Zhangyang Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for detecting Trojan attacks in deep neural networks by leveraging network sparsity, identifying a sparse subnetwork that retains Trojan features while being benign on clean inputs.

Contribution

The paper proposes a new Trojan detection approach based on network pruning and the lottery ticket hypothesis, effective even without clean training data.

Findings

01

Effective detection across multiple datasets and architectures

02

Trojan features are more stable to pruning than benign features

03

Achieves high accuracy in isolating Trojan information

Abstract

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/backdoor-lth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)

MethodsPruning