"Faithful to What?" On the Limits of Fidelity-Based Explanations

Jackson Eshbaugh

arXiv:2506.12176·cs.LG·April 21, 2026

"Faithful to What?" On the Limits of Fidelity-Based Explanations

Jackson Eshbaugh

PDF

1 Repo

TL;DR

This paper introduces the linearity score to evaluate whether surrogate models truly capture the data's underlying structure, revealing limitations of fidelity-based explanations in AI.

Contribution

The work proposes the linearity score as a diagnostic tool to assess the true explanatory power of surrogate models beyond fidelity measures.

Findings

01

High fidelity to neural networks does not guarantee capturing task-relevant data structure.

02

Surrogates can outperform linear baselines trained directly on data despite high fidelity.

03

Fidelity-based explanations may misrepresent a model's understanding of the underlying task.

Abstract

In explainable AI, surrogate models are commonly evaluated by their fidelity to a neural network's predictions. Fidelity, however, measures alignment to a learned model rather than alignment to the data-generating signal underlying the task. This work introduces the linearity score $λ (f)$ , a diagnostic that quantifies the extent to which a regression network's input--output behavior is linearly decodable. $λ (f)$ is defined as an $R^{2}$ measure of surrogate fit to the network. Across synthetic and real-world regression datasets, we find that surrogates can achieve high fidelity to a neural network while failing to recover the predictive gains that distinguish the network from simpler models. In several cases, high-fidelity surrogates underperform even linear baselines trained directly on the data. These results demonstrate that explaining a model's behavior is not equivalent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacksoneshbaugh/lambda-linearity-score
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.