Interpretation of Neural Networks is Fragile
Amirata Ghorbani, Abubakar Abid, James Zou

TL;DR
This paper demonstrates that current deep learning interpretation methods are highly fragile, with small input perturbations causing significant changes in explanations, raising concerns about their reliability.
Contribution
The paper systematically analyzes and reveals the extreme fragility of popular interpretation methods for deep neural networks on image datasets.
Findings
Small perturbations can drastically change feature importance explanations.
Interpretations based on exemplars are also fragile.
Hessian matrix analysis provides insight into the fundamental nature of this fragility.
Abstract
In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself? In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
