Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and dictionary-based representations

Julien Colin; Lore Goetschalckx; Thomas Fel; Victor Boutin; Thomas Serre; Nuria Oliver

arXiv:2411.03993·cs.CV·March 17, 2026

Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and dictionary-based representations

Julien Colin, Lore Goetschalckx, Thomas Fel, Victor Boutin, Thomas Serre, Nuria Oliver

PDF

Open Access

TL;DR

This study compares neuron-based and dictionary-based representations in neural networks, finding that dictionary-based bases are more interpretable to humans, especially in deeper layers, challenging the neuron-centric interpretability paradigm.

Contribution

It provides large-scale psychophysical evidence that dictionary-based representations outperform neuron-based ones in interpretability, highlighting the importance of basis choice in explainability.

Findings

01

Dictionary-based representations are more interpretable than neuron-based ones.

02

Interpretability advantage increases in deeper network layers.

03

ResNet50's interpretability depends on the basis used, revealing superposition issues.

Abstract

Interpretability research often adopts a neuron-centric lens, treating individual neurons as the fundamental units of explanation. However, neuron-level explanations can be undermined by superposition, where single units respond to mixtures of unrelated patterns. Dictionary learning methods, such as sparse autoencoders and non-negative matrix factorization, offer a promising alternative by learning a new basis over layer activations. Despite this promise, direct human evaluations comparing neuron-based and dictionary-based representations remain limited. We conducted three large-scale online psychophysics experiments (N=481) comparing explanations derived from neuron-based and dictionary-based representations in two convolutional neural networks (ResNet50, VGG16). We operationalize interpretability via visual coherence: a basis is more interpretable if humans can reliably recognize a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsALIGN