The Dead Salmons of AI Interpretability
Maxime M\'eloux, Giada Dirupo, Fran\c{c}ois Portet, Maxime Peyrard

TL;DR
This paper highlights the pitfalls of current AI interpretability methods by comparing them to flawed neuroscience studies, advocating for a statistical-causal framework to improve reliability and scientific rigor.
Contribution
It introduces a pragmatic statistical-causal perspective for AI interpretability, emphasizing the importance of testing explanations against explicit hypotheses and quantifying uncertainty.
Findings
Interpretability methods can produce plausible artifacts on random models.
A statistical framework helps distinguish meaningful explanations from noise.
Identifiability issues threaten the reliability of interpretability claims.
Abstract
In a striking neuroscience study, the authors placed a dead salmon in an MRI scanner and showed it images of humans in social situations. Astonishingly, standard analyses of the time reported brain regions predictive of social emotions. The explanation, of course, was not supernatural cognition but a cautionary tale about misapplied statistical inference. In AI interpretability, reports of similar ''dead salmon'' artifacts abound: feature attribution, probing, sparse auto-encoding, and even causal analyses can produce plausible-looking explanations for randomly initialized neural networks. In this work, we examine this phenomenon and argue for a pragmatic statistical-causal reframing: explanations of computational systems should be treated as parameters of a (statistical) model, inferred from computational traces. This perspective goes beyond simply measuring statistical variability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Embodied and Extended Cognition · Face Recognition and Perception
