Attention Mechanisms Don't Learn Additive Models: Rethinking Feature   Importance for Transformers

Tobias Leemann; Alina Fastowski; Felix Pfeiffer; Gjergji Kasneci

arXiv:2405.13536·cs.LG·January 10, 2025

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci

PDF

Open Access 1 Repo

TL;DR

This paper proves that transformers cannot be accurately explained using traditional additive feature attribution methods and introduces SLALOM, a new surrogate model that aligns with transformer architecture to improve explanation fidelity and efficiency.

Contribution

The paper formally demonstrates the incompatibility of additive models with transformers and proposes SLALOM, a novel surrogate model tailored for transformer explanations.

Findings

01

SLALOM provides higher fidelity explanations than existing surrogates.

02

SLALOM achieves comparable explanation quality at lower computational costs.

03

Transformers are structurally incapable of representing additive surrogate models.

Abstract

We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tleemann/slalom_explanations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science

MethodsALIGN