Scanning Trojaned Models Using Out-of-Distribution Samples
Hossein Mirzaei, Ali Ansari, Bahar Dibaei Nia, Mojtaba Nafez, Moein, Madadi, Sepehr Rezaee, Zeinab Sadat Taghavi, Arad Maleki, Kian Shamsaie,, Mahdi Hajialilue, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban

TL;DR
This paper introduces TRODO, a novel method for detecting trojaned neural networks by identifying adversarial shifts in out-of-distribution samples, effective across various attack types and training scenarios.
Contribution
TRODO is a new trojan detection approach that does not rely on prior assumptions and works even against adversarially trained models without needing training data.
Findings
High detection accuracy across multiple datasets
Effective against adversarially trained trojaned models
Works without training data or prior attack knowledge
Abstract
Scanning for trojan (backdoor) in deep neural networks is crucial due to their significant real-world applications. There has been an increasing focus on developing effective general trojan scanning methods across various trojan attacks. Despite advancements, there remains a shortage of methods that perform effectively without preconceived assumptions about the backdoor attack method. Additionally, we have observed that current methods struggle to identify classifiers trojaned using adversarial training. Motivated by these challenges, our study introduces a novel scanning method named TRODO (TROjan scanning by Detection of adversarial shifts in Out-of-distribution samples). TRODO leverages the concept of "blind spots"--regions where trojaned classifiers erroneously identify out-of-distribution (OOD) samples as in-distribution (ID). We scan for these blind spots by adversarially shifting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStatistical Methods and Inference · Adversarial Robustness in Machine Learning
MethodsFocus
