Rethinking Vision Transformer Depth via Structural Reparameterization

Chengwei Zhou; Vipin Chaudhary; Gourav Datta

arXiv:2511.19718·cs.CV·November 26, 2025

Rethinking Vision Transformer Depth via Structural Reparameterization

Chengwei Zhou, Vipin Chaudhary, Gourav Datta

PDF

Open Access

TL;DR

This paper introduces a structural reparameterization method that reduces the depth of Vision Transformers during training, enabling fewer layers at inference with maintained accuracy and faster speeds, challenging the need for very deep models.

Contribution

We propose a branch-based reparameterization technique that consolidates transformer branches into single-path models, reducing depth without accuracy loss.

Findings

01

Reduced ViT-Tiny from 12 to 3-6 layers while maintaining accuracy

02

Achieved up to 37% inference speedup on mobile CPUs

03

Challenged the necessity of very deep transformer stacks

Abstract

The computational overhead of Vision Transformers in practice stems fundamentally from their deep architectures, yet existing acceleration strategies have primarily targeted algorithmic-level optimizations such as token pruning and attention speedup. This leaves an underexplored research question: can we reduce the number of stacked transformer layers while maintaining comparable representational capacity? To answer this, we propose a branch-based structural reparameterization technique that operates during the training phase. Our approach leverages parallel branches within transformer blocks that can be systematically consolidated into streamlined single-path models suitable for inference deployment. The consolidation mechanism works by gradually merging branches at the entry points of nonlinear components, enabling both feed-forward networks (FFN) and multi-head self-attention (MHSA)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors