MISA: Online Defense of Trojaned Models using Misattributions

Panagiota Kiourti; Wenchao Li; Anirban Roy; Karan Sikka; and Susmit; Jha

arXiv:2103.15918·cs.CR·September 27, 2021

MISA: Online Defense of Trojaned Models using Misattributions

Panagiota Kiourti, Wenchao Li, Anirban Roy, Karan Sikka, and Susmit, Jha

PDF

Open Access

TL;DR

MISA is an online detection method for Trojan triggers in neural networks that analyzes misattributions in feature space, achieving high accuracy without prior trigger pattern assumptions.

Contribution

Introduces MISA, a novel online Trojan detection approach based on misattributions, effective across diverse trigger patterns and benchmarks.

Findings

01

Achieves 96% AUC in Trojan trigger detection

02

Effective against recent unseen trigger patterns

03

Operates without assumptions on trigger pattern

Abstract

Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways. This paper proposes MISA, a new online approach to detect Trojan triggers for neural networks at inference time. Our approach is based on a novel notion called misattributions, which captures the anomalous manifestation of a Trojan activation in the feature space. Given an input image and the corresponding output prediction, our algorithm first computes the model's attribution on different features. It then statistically analyzes these attributions to ascertain the presence of a Trojan trigger. Across a set of benchmarks, we show that our method can effectively detect Trojan triggers for a wide variety of trigger patterns, including several recent ones for which there are no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Malware Detection Techniques