Attention cannot be an Explanation
Arjun R Akula, Song-Chun Zhu

TL;DR
This paper critically examines whether attention mechanisms can serve as reliable explanations for neural network decisions, concluding through human studies that they are ineffective for increasing trust and reliance.
Contribution
The study provides empirical evidence that attention weights are not suitable explanations, even in cases where they correlate with feature importance, challenging their interpretability utility.
Findings
Attention weights are uncorrelated with feature importance in many cases.
Human studies show attention does not increase trust or reliance.
Attention cannot be used as an effective explanation.
Abstract
Attention based explanations (viz. saliency maps), by providing interpretability to black box models such as deep neural networks, are assumed to improve human trust and reliance in the underlying models. Recently, it has been shown that attention weights are frequently uncorrelated with gradient-based measures of feature importance. Motivated by this, we ask a follow-up question: "Assuming that we only consider the tasks where attention weights correlate well with feature importance, how effective are these attention based explanations in increasing human trust and reliance in the underlying models?". In other words, can we use attention as an explanation? We perform extensive human study experiments that aim to qualitatively and quantitatively assess the degree to which attention based explanations are suitable in increasing human trust and reliance. Our experiment results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Adversarial Robustness in Machine Learning
