Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers
Siyu Zhang

TL;DR
This paper systematically evaluates whether structural sparsity in Vision Transformers improves interpretability, finding that pruning reduces circuit complexity but does not enhance interpretability at multiple levels.
Contribution
Introduces IMPACT, a comprehensive multi-level framework for evaluating interpretability in vision models, and provides empirical evidence that sparsity alone does not improve interpretability.
Findings
Sparse models have fewer circuit edges but similar or more active nodes.
Pruning redistributes computation rather than simplifying functions.
No significant improvements in neuron selectivity or attribution faithfulness.
Abstract
Sparse neural networks are often hypothesized to be more interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity itself leads to improved semantic interpretability. In this work, we systematically evaluate the relationship between weight sparsity and interpretability in Vision Transformers using DeiT-III B/16 models pruned with Wanda. To assess interpretability comprehensively, we introduce \textbf{IMPACT}, a multi-level framework that evaluates interpretability across four complementary levels: neurons, layer representations, task circuits, and model-level attribution. Layer representations are analyzed using BatchTopK sparse autoencoders, circuits are extracted via learnable node masking, and explanations are evaluated with transformer attribution using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
