Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan; Edan Hauon; Avi Caciularu; Ori Katz; Itzik Malkiel; Omri; Armstrong; Noam Koenigstein

arXiv:2204.11073·cs.LG·April 26, 2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri, Armstrong, Noam Koenigstein

PDF

TL;DR

Grad-SAM is a new gradient-based method for interpreting transformer models by analyzing self-attention maps, providing better explanations of model predictions across multiple benchmarks.

Contribution

Introduces Gradient Self-Attention Maps (Grad-SAM), a novel approach for explaining transformer predictions through gradient analysis of self-attention units.

Findings

01

Grad-SAM outperforms existing explanation methods on various benchmarks.

02

The method effectively identifies input elements that influence model predictions.

03

Extensive evaluations demonstrate the superiority of Grad-SAM in interpretability tasks.

Abstract

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.