Entropy Reveals Block Importance in Masked Self-Supervised Vision Transformers
Peihao Xiang, Kaida Wu, Ou Bai

TL;DR
This paper introduces Gardener, a data-free method that uses information entropy to identify and prune redundant blocks in masked self-supervised vision transformers, significantly reducing model size while maintaining performance.
Contribution
It demonstrates that block importance can be estimated without data using entropy, enabling effective one-shot pruning of vision transformers.
Findings
Gardener matches or outperforms existing data-free pruning methods.
Up to 91.7% of blocks can be pruned with minimal performance loss.
Entropy correlates strongly with block importance in pretrained models.
Abstract
Masked self-supervised vision transformers have become a dominant pretraining paradigm, yet their substantial model size poses significant challenges for resource-constrained deployment and efficient transfer learning. A fundamental question remains: are all transformer blocks equally important for downstream performance? In this paper, we show that block importance in masked self-supervised vision transformers can be accurately estimated without access to any data. Our key finding is that the information entropy of pretrained block weights strongly correlates with oracle sensitivity obtained via iterative block removal and finetuning. This observation enables Gardener, a data-free, one-shot, block-level pruning principle that identifies redundant blocks through simple information-theoretic measurements. We evaluate Gardener on VideoMAE-B across multiple pruning ratios and downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
