Attention is not Explanation

Sarthak Jain; Byron C. Wallace

arXiv:1902.10186·cs.CL·May 10, 2019·491 cites

Attention is not Explanation

Sarthak Jain, Byron C. Wallace

PDF

Open Access 5 Repos

TL;DR

This paper critically examines the common assumption that attention weights in neural NLP models serve as meaningful explanations, demonstrating through experiments that they do not reliably indicate feature importance or model reasoning.

Contribution

The study provides extensive empirical evidence that attention weights are not reliable explanations, challenging their interpretability in neural NLP models.

Findings

01

Attention weights are often uncorrelated with gradient-based importance measures.

02

Different attention distributions can produce the same model predictions.

03

Attention modules do not provide meaningful explanations for model decisions.

Abstract

Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations' for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning