On Explaining with Attention Matrices

Omar Naim; Nicholas Asher

arXiv:2410.18541·cs.CL·October 25, 2024

On Explaining with Attention Matrices

Omar Naim, Nicholas Asher

PDF

1 Repo

TL;DR

This paper challenges the notion that attention weights in transformer models explain model predictions, introducing efficient attention that isolates the effective components of attention matrices and demonstrates their causal role in NLP tasks.

Contribution

The paper corrects formal arguments about attention weights' explanatory relevance and introduces efficient attention, which effectively isolates and computes the causal components of attention matrices.

Findings

01

Efficient attention matrices are probability distributions.

02

Efficient attention has a causal role in model predictions.

03

Empirical results support the effectiveness of efficient attention across datasets.

Abstract

This paper explores the much discussed, possible explanatory link between attention weights (AW) in transformer models and predicted output. Contrary to intuition and early research on attention, more recent prior research has provided formal arguments and empirical evidence that AW are not explanatorily relevant. We show that the formal arguments are incorrect. We introduce and effectively compute efficient attention, which isolates the effective components of attention matrices in tasks and models in which AW play an explanatory role. We show that efficient attention has a causal role (provides minimally necessary and sufficient conditions) for predicting model output in NLP tasks requiring contextual information, and we show, contrary to [7], that efficient attention matrices are probability distributions and are effectively calculable. Thus, they should play an important part in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omyokun/on-explaining-with-attention-matrices
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need