EA-ViT: Efficient Adaptation for Elastic Vision Transformer

Chen Zhu; Wangbo Zhao; Huiwen Zhang; Samir Khaki; Yuhao Zhou; Weidong Tang; Shuo Wang; Zhihang Yuan; Yuzhang Shang; Xiaojiang Peng; Kai Wang; Dawei Yang

arXiv:2507.19360·cs.CV·July 28, 2025

EA-ViT: Efficient Adaptation for Elastic Vision Transformer

Chen Zhu, Wangbo Zhao, Huiwen Zhang, Samir Khaki, Yuhao Zhou, Weidong Tang, Shuo Wang, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Kai Wang, Dawei Yang

PDF

Open Access

TL;DR

EA-ViT introduces a flexible, multi-size Vision Transformer adaptation framework that efficiently generates models tailored to various resource constraints through a two-stage process involving elastic architecture enhancement and a lightweight router.

Contribution

The paper presents a novel elastic architecture and a router-based adaptation method enabling a single ViT to produce multiple resource-efficient models.

Findings

01

Effective multi-size ViT models for diverse platforms

02

Stable adaptation with curriculum-based training

03

Router optimized with Pareto-efficient configurations

Abstract

Vision Transformers (ViTs) have emerged as a foundational model in computer vision, excelling in generalization and adaptation to downstream tasks. However, deploying ViTs to support diverse resource constraints typically requires retraining multiple, size-specific ViTs, which is both time-consuming and energy-intensive. To address this issue, we propose an efficient ViT adaptation framework that enables a single adaptation process to generate multiple models of varying sizes for deployment on platforms with various resource constraints. Our approach comprises two stages. In the first stage, we enhance a pre-trained ViT with a nested elastic architecture that enables structural flexibility across MLP expansion ratio, number of attention heads, embedding dimension, and network depth. To preserve pre-trained knowledge and ensure stable adaptation, we adopt a curriculum-based training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning