Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann, Thomas Klein, Wieland Brendel

TL;DR
Scaling neural networks in vision does not enhance mechanistic interpretability, with newer models being less interpretable than older ones, emphasizing the need for models designed for interpretability and better evaluation methods.
Contribution
This study systematically evaluates the impact of scale on interpretability in vision models, revealing no positive correlation and highlighting the importance of designing inherently interpretable models.
Findings
No scaling effect on interpretability across models and dataset sizes.
Modern vision models are less interpretable than older architectures.
A large dataset of human responses was released to aid interpretability research.
Abstract
In light of the recent widespread adoption of AI systems, understanding the internal information processing of neural networks has become increasingly critical. Most recently, machine vision has seen remarkable progress by scaling neural networks to unprecedented levels in dataset and model size. We here ask whether this extraordinary increase in scale also positively impacts the field of mechanistic interpretability. In other words, has our understanding of the inner workings of scaled neural networks improved as well? We use a psychophysical paradigm to quantify one form of mechanistic interpretability for a diverse suite of nine models and find no scaling effect for interpretability - neither for model nor dataset size. Specifically, none of the investigated state-of-the-art models are easier to interpret than the GoogLeNet model from almost a decade ago. Latest-generation vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
MethodsAuxiliary Classifier · Local Response Normalization · Convolution · Dense Connections · Dropout · Softmax · 1x1 Convolution · Inception Module · Max Pooling · Average Pooling
