Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns

George Mihaila

arXiv:2601.14112·cs.CL·January 22, 2026

Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns

George Mihaila

PDF

Open Access

TL;DR

This paper introduces ExpNet, a neural network that learns to generate token importance scores from transformer attention patterns, improving interpretability in high-stakes AI applications.

Contribution

ExpNet automatically learns optimal attention feature combinations for token attribution, surpassing manual and black-box explanation methods.

Findings

01

ExpNet outperforms existing attention-based explanation methods.

02

It demonstrates strong generalization across multiple tasks.

03

ExpNet reduces reliance on fixed aggregation rules.

Abstract

Explainable AI (XAI) has become critical as transformer-based models are deployed in high-stakes applications including healthcare, legal systems, and financial services, where opacity hinders trust and accountability. Transformers self-attention mechanisms have proven valuable for model interpretability, with attention weights successfully used to understand model focus and behavior (Xu et al., 2015); (Wiegreffe and Pinter, 2019). However, existing attention-based explanation methods rely on manually defined aggregation strategies and fixed attribution rules (Abnar and Zuidema, 2020a); (Chefer et al., 2021), while model-agnostic approaches (LIME, SHAP) treat the model as a black box and incur significant computational costs through input perturbation. We introduce Explanation Network (ExpNet), a lightweight neural network that learns an explicit mapping from transformer attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education