MISA: Online Defense of Trojaned Models using Misattributions
Panagiota Kiourti, Wenchao Li, Anirban Roy, Karan Sikka, and Susmit, Jha

TL;DR
MISA is an online detection method for Trojan triggers in neural networks that analyzes misattributions in feature space, achieving high accuracy without prior trigger pattern assumptions.
Contribution
Introduces MISA, a novel online Trojan detection approach based on misattributions, effective across diverse trigger patterns and benchmarks.
Findings
Achieves 96% AUC in Trojan trigger detection
Effective against recent unseen trigger patterns
Operates without assumptions on trigger pattern
Abstract
Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways. This paper proposes MISA, a new online approach to detect Trojan triggers for neural networks at inference time. Our approach is based on a novel notion called misattributions, which captures the anomalous manifestation of a Trojan activation in the feature space. Given an input image and the corresponding output prediction, our algorithm first computes the model's attribution on different features. It then statistically analyzes these attributions to ascertain the presence of a Trojan trigger. Across a set of benchmarks, we show that our method can effectively detect Trojan triggers for a wide variety of trigger patterns, including several recent ones for which there are no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Malware Detection Techniques
