Rethinking Vision Transformer Depth via Structural Reparameterization
Chengwei Zhou, Vipin Chaudhary, Gourav Datta

TL;DR
This paper introduces a structural reparameterization method that reduces the depth of Vision Transformers during training, enabling fewer layers at inference with maintained accuracy and faster speeds, challenging the need for very deep models.
Contribution
We propose a branch-based reparameterization technique that consolidates transformer branches into single-path models, reducing depth without accuracy loss.
Findings
Reduced ViT-Tiny from 12 to 3-6 layers while maintaining accuracy
Achieved up to 37% inference speedup on mobile CPUs
Challenged the necessity of very deep transformer stacks
Abstract
The computational overhead of Vision Transformers in practice stems fundamentally from their deep architectures, yet existing acceleration strategies have primarily targeted algorithmic-level optimizations such as token pruning and attention speedup. This leaves an underexplored research question: can we reduce the number of stacked transformer layers while maintaining comparable representational capacity? To answer this, we propose a branch-based structural reparameterization technique that operates during the training phase. Our approach leverages parallel branches within transformer blocks that can be systematically consolidated into streamlined single-path models suitable for inference deployment. The consolidation mechanism works by gradually merging branches at the entry points of nonlinear components, enabling both feed-forward networks (FFN) and multi-head self-attention (MHSA)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors
