Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and dictionary-based representations
Julien Colin, Lore Goetschalckx, Thomas Fel, Victor Boutin, Thomas Serre, Nuria Oliver

TL;DR
This study compares neuron-based and dictionary-based representations in neural networks, finding that dictionary-based bases are more interpretable to humans, especially in deeper layers, challenging the neuron-centric interpretability paradigm.
Contribution
It provides large-scale psychophysical evidence that dictionary-based representations outperform neuron-based ones in interpretability, highlighting the importance of basis choice in explainability.
Findings
Dictionary-based representations are more interpretable than neuron-based ones.
Interpretability advantage increases in deeper network layers.
ResNet50's interpretability depends on the basis used, revealing superposition issues.
Abstract
Interpretability research often adopts a neuron-centric lens, treating individual neurons as the fundamental units of explanation. However, neuron-level explanations can be undermined by superposition, where single units respond to mixtures of unrelated patterns. Dictionary learning methods, such as sparse autoencoders and non-negative matrix factorization, offer a promising alternative by learning a new basis over layer activations. Despite this promise, direct human evaluations comparing neuron-based and dictionary-based representations remain limited. We conducted three large-scale online psychophysics experiments (N=481) comparing explanations derived from neuron-based and dictionary-based representations in two convolutional neural networks (ResNet50, VGG16). We operationalize interpretability via visual coherence: a basis is more interpretable if humans can reliably recognize a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsALIGN
