Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

Siyu Zhang

arXiv:2603.15919·cs.CV·March 24, 2026

Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

Siyu Zhang

PDF

Open Access

TL;DR

This paper systematically evaluates whether structural sparsity in Vision Transformers improves interpretability, finding that pruning reduces circuit complexity but does not enhance interpretability at multiple levels.

Contribution

Introduces IMPACT, a comprehensive multi-level framework for evaluating interpretability in vision models, and provides empirical evidence that sparsity alone does not improve interpretability.

Findings

01

Sparse models have fewer circuit edges but similar or more active nodes.

02

Pruning redistributes computation rather than simplifying functions.

03

No significant improvements in neuron selectivity or attribution faithfulness.

Abstract

Sparse neural networks are often hypothesized to be more interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity itself leads to improved semantic interpretability. In this work, we systematically evaluate the relationship between weight sparsity and interpretability in Vision Transformers using DeiT-III B/16 models pruned with Wanda. To assess interpretability comprehensively, we introduce \textbf{IMPACT}, a multi-level framework that evaluates interpretability across four complementary levels: neurons, layer representations, task circuits, and model-level attribution. Layer representations are analyzed using BatchTopK sparse autoencoders, circuits are extracted via learnable node masking, and explanations are evaluated with transformer attribution using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning