Transformer Interpretability Beyond Attention Visualization

Hila Chefer; Shir Gur; Lior Wolf

arXiv:2012.09838·cs.CV·April 6, 2021

Transformer Interpretability Beyond Attention Visualization

Hila Chefer, Shir Gur, Lior Wolf

PDF

3 Repos

TL;DR

This paper introduces a new method for interpreting Transformer models by computing relevancy scores through Deep Taylor Decomposition, effectively visualizing decision-making in both vision and text tasks.

Contribution

It presents a novel relevancy propagation technique for Transformers that surpasses existing explainability methods in accuracy and applicability.

Findings

01

Outperforms existing explainability methods on visual Transformer benchmarks

02

Effective relevancy propagation through attention layers and skip connections

03

Applicable to both vision and text Transformer models

Abstract

Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing methods either rely on the obtained attention maps or employ heuristic propagation along the attention graph. In this work, we propose a novel way to compute relevancy for Transformer networks. The method assigns local relevance based on the Deep Taylor Decomposition principle and then propagates these relevancy scores through the layers. This propagation involves attention layers and skip connections, which challenge existing methods. Our solution is based on a specific formulation that is shown to maintain the total relevancy across layers. We benchmark our method on very recent visual Transformer networks, as well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Softmax · Residual Connection · Adam · Attention Is All You Need · Byte Pair Encoding · Layer Normalization