Elastic ViTs from Pretrained Models without Retraining

Walter Simoncini; Michael Dorkenwald; Tijmen Blankevoort; Cees G.M. Snoek; Yuki M. Asano

arXiv:2510.17700·cs.CV·October 21, 2025

Elastic ViTs from Pretrained Models without Retraining

Walter Simoncini, Michael Dorkenwald, Tijmen Blankevoort, Cees G.M. Snoek, Yuki M. Asano

PDF

Open Access 1 Video

TL;DR

This paper introduces SnapViT, a fast, retraining-free structured pruning method for pretrained Vision Transformers that creates elastic models adaptable to various compute budgets without sacrificing performance.

Contribution

It proposes a novel, efficient pruning strategy using evolutionary algorithms and self-supervised importance scoring for elastic inference in pretrained Vision Transformers.

Findings

01

Outperforms state-of-the-art pruning methods across various sparsities.

02

Generates elastic models in less than five minutes on a single A100 GPU.

03

Maintains high performance without requiring retraining or labeled data.

Abstract

Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes, forcing sub-optimal deployment choices under real-world constraints. We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers, a new post-pretraining structured pruning method that enables elastic inference across a continuum of compute budgets. Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm, does not require labeled data, generalizes to models without a classification head, and is retraining-free. Experiments on DINO, SigLIPv2, DeIT, and AugReg models demonstrate superior performance over state-of-the-art methods across various sparsities, requiring less than five minutes on a single A100 GPU to generate elastic models that can be adjusted to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Elastic ViTs from Pretrained Models without Retraining· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications