Interpretation of Neural Networks is Fragile

Amirata Ghorbani; Abubakar Abid; James Zou

arXiv:1710.10547·stat.ML·November 7, 2018·77 cites

Interpretation of Neural Networks is Fragile

Amirata Ghorbani, Abubakar Abid, James Zou

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that current deep learning interpretation methods are highly fragile, with small input perturbations causing significant changes in explanations, raising concerns about their reliability.

Contribution

The paper systematically analyzes and reveals the extreme fragility of popular interpretation methods for deep neural networks on image datasets.

Findings

01

Small perturbations can drastically change feature importance explanations.

02

Interpretations based on exemplars are also fragile.

03

Hessian matrix analysis provides insight into the fundamental nature of this fragility.

Abstract

In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself? In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications