A Close Look at Decomposition-based XAI-Methods for Transformer Language   Models

Leila Arras; Bruno Puri; Patrick Kahardipraja; Sebastian Lapuschkin,; Wojciech Samek

arXiv:2502.15886·cs.CL·February 25, 2025

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

Leila Arras, Bruno Puri, Patrick Kahardipraja, Sebastian Lapuschkin,, Wojciech Samek

PDF

Open Access 2 Repos

TL;DR

This paper provides a comprehensive evaluation of decomposition-based XAI methods for transformer language models, comparing ALTI-Logit, LRP, and AttnLRP, and introduces a benchmark dataset for future research.

Contribution

It conducts the first detailed quantitative and qualitative comparison of prominent decomposition-based XAI methods on language models, and releases a benchmark dataset and code.

Findings

01

ALTI-Logit and LRP show differing strengths in attribution accuracy.

02

AttnLRP offers a promising extension with improved interpretability.

03

Gradient-based methods provide complementary insights.

Abstract

Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Linear Warmup With Linear Decay · Multi-Head Attention · BERT · Adam · Softmax · Dropout