Cassandra: Detecting Trojaned Networks from Adversarial Perturbations
Xiaoyu Zhang, Ajmal Mian, Rohit Gupta, Nazanin Rahnavard, Mubarak, Shah

TL;DR
This paper introduces a novel method for detecting Trojaned neural networks by analyzing adversarial perturbations, achieving over 92% accuracy and demonstrating invariance to trigger variations, data, and architecture.
Contribution
The paper presents a new Trojan detection technique based on network fingerprints derived from adversarial perturbations, with a large-scale evaluation on diverse datasets.
Findings
Achieves over 92% detection accuracy.
Effective across different trigger types, sizes, and architectures.
Largest study to date on Trojaned network detection.
Abstract
Deep neural networks are being widely deployed for many critical tasks due to their high classification accuracy. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. These malicious behaviors can be triggered at the adversary's will and hence, cause a serious threat to the widespread deployment of deep models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients. Inserting backdoors into a network alters its decision boundaries which are effectively encoded in their adversarial perturbations. We train a two stream network for Trojan detection from its global ( and bounded) perturbations and the localized region of high energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
