NeuronInspect: Detecting Backdoors in Neural Networks via Output   Explanations

Xijie Huang; Moustafa Alzantot; Mani Srivastava

arXiv:1911.07399·cs.CR·November 19, 2019·65 cites

NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

Xijie Huang, Moustafa Alzantot, Mani Srivastava

PDF

Open Access

TL;DR

NeuronInspect is a novel framework that uses output explanation heatmaps and feature analysis to detect trojan backdoors in neural networks, demonstrating superior robustness and effectiveness over existing methods.

Contribution

It introduces a new explanation-based approach for backdoor detection, combining heatmap analysis and outlier detection to identify attack targets.

Findings

01

Effective detection on MNIST and GTSRB datasets

02

Outperforms Neural Cleanse in robustness and accuracy

03

Applicable to various attack scenarios

Abstract

Deep neural networks have achieved state-of-the-art performance on various tasks. However, lack of interpretability and transparency makes it easier for malicious attackers to inject trojan backdoor into the neural networks, which will make the model behave abnormally when a backdoor sample with a specific trigger is input. In this paper, we propose NeuronInspect, a framework to detect trojan backdoors in deep neural networks via output explanation techniques. NeuronInspect first identifies the existence of backdoor attack targets by generating the explanation heatmap of the output layer. We observe that generated heatmaps from clean and backdoored models have different characteristics. Therefore we extract features that measure the attributes of explanations from an attacked model namely: sparse, smooth and persistent. We combine these features and use outlier detection to figure out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications

MethodsInterpretability · Heatmap