RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers

Xuwei Xu; Yang Li; Yudong Chen; Jiajun Liu; Sen Wang

arXiv:2505.21847·cs.CV·June 3, 2025

RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers

Xuwei Xu, Yang Li, Yudong Chen, Jiajun Liu, Sen Wang

PDF

Open Access 1 Models

TL;DR

RePaViT introduces a structural reparameterization method focusing on FFN layers to significantly accelerate Vision Transformers during inference, achieving up to 68.7% speed-up with minimal accuracy loss.

Contribution

This work presents the first application of structural reparameterization on FFN layers in ViTs, enabling substantial inference speed improvements.

Findings

01

RePaViT achieves up to 68.7% speed-up on large models.

02

The method maintains or improves accuracy despite acceleration.

03

Speed benefits scale with model size, with larger models gaining more.

Abstract

We reveal that feedforward network (FFN) layers, rather than attention layers, are the primary contributors to Vision Transformer (ViT) inference latency, with their impact signifying as model size increases. This finding highlights a critical opportunity for optimizing the efficiency of large-scale ViTs by focusing on FFN layers. In this work, we propose a novel channel idle mechanism that facilitates post-training structural reparameterization for efficient FFN layers during testing. Specifically, a set of feature channels remains idle and bypasses the nonlinear activation function in each FFN layer, thereby forming a linear pathway that enables structural reparameterization during inference. This mechanism results in a family of ReParameterizable Vision Transformers (RePaViTs), which achieve remarkable latency reductions with acceptable sacrifices (sometimes gains) in accuracy across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Ackesnal/RePaViT
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · Infrared Target Detection Methodologies