Explanatory Masks for Neural Network Interpretability

Lawrence Phillips; Garrett Goh; Nathan Hodas

arXiv:1911.06876·cs.LG·November 19, 2019·1 cites

Explanatory Masks for Neural Network Interpretability

Lawrence Phillips, Garrett Goh, Nathan Hodas

PDF

Open Access

TL;DR

This paper introduces a method to generate explanation masks for pre-trained neural networks, helping to identify input features crucial for their predictions across various domains.

Contribution

It proposes a secondary network that produces minimal explanation masks while maintaining the original network's predictive accuracy, applicable to multiple architectures.

Findings

01

Effective explanation masks for CNNs, RNNs, and mixed architectures

02

Masks localize key input features for predictions

03

Method preserves accuracy with minimal explanations

Abstract

Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original network. Masks are created by a secondary network whose goal is to create as small an explanation as possible while still preserving the predictive accuracy of the original network. We demonstrate the applicability of our method for image classification with CNNs, sentiment analysis with RNNs, and chemical property prediction with mixed CNN/RNN architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science

MethodsInterpretability