Entropy Reveals Block Importance in Masked Self-Supervised Vision Transformers

Peihao Xiang; Kaida Wu; Ou Bai

arXiv:2602.03918·cs.CV·February 5, 2026

Entropy Reveals Block Importance in Masked Self-Supervised Vision Transformers

Peihao Xiang, Kaida Wu, Ou Bai

PDF

Open Access

TL;DR

This paper introduces Gardener, a data-free method that uses information entropy to identify and prune redundant blocks in masked self-supervised vision transformers, significantly reducing model size while maintaining performance.

Contribution

It demonstrates that block importance can be estimated without data using entropy, enabling effective one-shot pruning of vision transformers.

Findings

01

Gardener matches or outperforms existing data-free pruning methods.

02

Up to 91.7% of blocks can be pruned with minimal performance loss.

03

Entropy correlates strongly with block importance in pretrained models.

Abstract

Masked self-supervised vision transformers have become a dominant pretraining paradigm, yet their substantial model size poses significant challenges for resource-constrained deployment and efficient transfer learning. A fundamental question remains: are all transformer blocks equally important for downstream performance? In this paper, we show that block importance in masked self-supervised vision transformers can be accurately estimated without access to any data. Our key finding is that the information entropy of pretrained block weights strongly correlates with oracle sensitivity obtained via iterative block removal and finetuning. This observation enables Gardener, a data-free, one-shot, block-level pruning principle that identifies redundant blocks through simple information-theoretic measurements. We evaluate Gardener on VideoMAE-B across multiple pruning ratios and downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning