RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

TL;DR
This paper revisits lightweight CNNs from a ViT perspective, proposing RepViT, a new family of CNNs that outperform state-of-the-art models in accuracy and latency on mobile devices.
Contribution
It introduces RepViT, a novel lightweight CNN architecture inspired by ViT designs, demonstrating superior performance and efficiency on mobile vision tasks.
Findings
RepViT achieves over 80% top-1 accuracy on ImageNet with 1.0 ms latency.
RepViT outperforms existing lightweight ViTs in accuracy and speed.
RepViT-SAM is nearly 10 times faster than MobileSAM.
Abstract
Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections between lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro designs between them have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs from ViT perspective and emphasize their promising prospect for mobile devices. Specifically, we incrementally enhance the mobile-friendliness of a standard lightweight CNN, \ie, MobileNetV3, by integrating the efficient architectural designs of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗timm/repvit_m1.dist_in1kmodel· 90k dl· ♡ 190k dl♡ 1
- 🤗timm/repvit_m2.dist_in1kmodel· 67 dl67 dl
- 🤗timm/repvit_m3.dist_in1kmodel· 73 dl· ♡ 173 dl♡ 1
- 🤗timm/repvit_m0_9.dist_300e_in1kmodel· 4.1k dl· ♡ 14.1k dl♡ 1
- 🤗timm/repvit_m0_9.dist_450e_in1kmodel· 1.7k dl· ♡ 21.7k dl♡ 2
- 🤗timm/repvit_m1_0.dist_300e_in1kmodel· 1.9k dl1.9k dl
- 🤗timm/repvit_m1_0.dist_450e_in1kmodel· 441 dl441 dl
- 🤗timm/repvit_m1_1.dist_300e_in1kmodel· 836 dl836 dl
- 🤗timm/repvit_m1_1.dist_450e_in1kmodel· 325 dl325 dl
- 🤗timm/repvit_m1_5.dist_300e_in1kmodel· 3.3k dl3.3k dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Anomaly Detection Techniques and Applications
MethodsSegment Anything Model · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · ReLU6 · Dense Connections · Sigmoid Activation · Batch Normalization · 1x1 Convolution · Squeeze-and-Excitation Block
