The Dead Salmons of AI Interpretability

Maxime M\'eloux; Giada Dirupo; Fran\c{c}ois Portet; Maxime Peyrard

arXiv:2512.18792·cs.AI·December 23, 2025

The Dead Salmons of AI Interpretability

Maxime M\'eloux, Giada Dirupo, Fran\c{c}ois Portet, Maxime Peyrard

PDF

Open Access

TL;DR

This paper highlights the pitfalls of current AI interpretability methods by comparing them to flawed neuroscience studies, advocating for a statistical-causal framework to improve reliability and scientific rigor.

Contribution

It introduces a pragmatic statistical-causal perspective for AI interpretability, emphasizing the importance of testing explanations against explicit hypotheses and quantifying uncertainty.

Findings

01

Interpretability methods can produce plausible artifacts on random models.

02

A statistical framework helps distinguish meaningful explanations from noise.

03

Identifiability issues threaten the reliability of interpretability claims.

Abstract

In a striking neuroscience study, the authors placed a dead salmon in an MRI scanner and showed it images of humans in social situations. Astonishingly, standard analyses of the time reported brain regions predictive of social emotions. The explanation, of course, was not supernatural cognition but a cautionary tale about misapplied statistical inference. In AI interpretability, reports of similar ''dead salmon'' artifacts abound: feature attribution, probing, sparse auto-encoding, and even causal analyses can produce plausible-looking explanations for randomly initialized neural networks. In this work, we examine this phenomenon and argue for a pragmatic statistical-causal reframing: explanations of computational systems should be treated as parameters of a (statistical) model, inferred from computational traces. This perspective goes beyond simply measuring statistical variability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Embodied and Extended Cognition · Face Recognition and Perception