Explanation Bias is a Product: Revealing the Hidden Lexical and Position Preferences in Post-Hoc Feature Attribution

Jonathan Kamp; Roos Bakker; Dominique Blok

arXiv:2512.11108·cs.CL·April 21, 2026

Explanation Bias is a Product: Revealing the Hidden Lexical and Position Preferences in Post-Hoc Feature Attribution

Jonathan Kamp, Roos Bakker, Dominique Blok

PDF

TL;DR

This paper investigates the biases in feature attribution explanations for language models, revealing how lexical and positional biases vary across methods and models, affecting trustworthiness.

Contribution

It introduces a model- and method-agnostic framework with evaluation metrics to systematically assess lexical and position biases in explanations.

Findings

01

A trade-off exists between lexical and position biases in models.

02

Models scoring high on one bias tend to score low on the other.

03

Anomalous explanations are more prone to bias.

Abstract

Good quality explanations strengthen the understanding of language models and data. Feature attribution methods, such as Integrated Gradient, are a type of post-hoc explainer that can provide token-level insights. However, explanations on the same input may vary greatly due to underlying biases of different methods. Users may be aware of this issue and mistrust their utility, while unaware users may trust them inadequately. In this work, we delve beyond the superficial inconsistencies between attribution methods, structuring their biases through a model- and method-agnostic framework of three evaluation metrics. We systematically assess both lexical and position bias (what and where in the input) for two transformers; first, in a controlled, pseudo-random classification task on artificial data; then, in a semi-controlled causal relation detection task on natural data. We find a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.