TL;DR
This paper introduces the linearity score to evaluate whether surrogate models truly capture the data's underlying structure, revealing limitations of fidelity-based explanations in AI.
Contribution
The work proposes the linearity score as a diagnostic tool to assess the true explanatory power of surrogate models beyond fidelity measures.
Findings
High fidelity to neural networks does not guarantee capturing task-relevant data structure.
Surrogates can outperform linear baselines trained directly on data despite high fidelity.
Fidelity-based explanations may misrepresent a model's understanding of the underlying task.
Abstract
In explainable AI, surrogate models are commonly evaluated by their fidelity to a neural network's predictions. Fidelity, however, measures alignment to a learned model rather than alignment to the data-generating signal underlying the task. This work introduces the linearity score , a diagnostic that quantifies the extent to which a regression network's input--output behavior is linearly decodable. is defined as an measure of surrogate fit to the network. Across synthetic and real-world regression datasets, we find that surrogates can achieve high fidelity to a neural network while failing to recover the predictive gains that distinguish the network from simpler models. In several cases, high-fidelity surrogates underperform even linear baselines trained directly on the data. These results demonstrate that explaining a model's behavior is not equivalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
