Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci

TL;DR
This paper proves that transformers cannot be accurately explained using traditional additive feature attribution methods and introduces SLALOM, a new surrogate model that aligns with transformer architecture to improve explanation fidelity and efficiency.
Contribution
The paper formally demonstrates the incompatibility of additive models with transformers and proposes SLALOM, a novel surrogate model tailored for transformer explanations.
Findings
SLALOM provides higher fidelity explanations than existing surrogates.
SLALOM achieves comparable explanation quality at lower computational costs.
Transformers are structurally incapable of representing additive surrogate models.
Abstract
We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
MethodsALIGN
