RepViT: Revisiting Mobile CNN From ViT Perspective

Ao Wang; Hui Chen; Zijia Lin; Jungong Han; Guiguang Ding

arXiv:2307.09283·cs.CV·March 15, 2024·28 cites

RepViT: Revisiting Mobile CNN From ViT Perspective

Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper revisits lightweight CNNs from a ViT perspective, proposing RepViT, a new family of CNNs that outperform state-of-the-art models in accuracy and latency on mobile devices.

Contribution

It introduces RepViT, a novel lightweight CNN architecture inspired by ViT designs, demonstrating superior performance and efficiency on mobile vision tasks.

Findings

01

RepViT achieves over 80% top-1 accuracy on ImageNet with 1.0 ms latency.

02

RepViT outperforms existing lightweight ViTs in accuracy and speed.

03

RepViT-SAM is nearly 10 times faster than MobileSAM.

Abstract

Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections between lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro designs between them have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs from ViT perspective and emphasize their promising prospect for mobile devices. Specifically, we incrementally enhance the mobile-friendliness of a standard lightweight CNN, \ie, MobileNetV3, by integrating the efficient architectural designs of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Anomaly Detection Techniques and Applications

MethodsSegment Anything Model · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · ReLU6 · Dense Connections · Sigmoid Activation · Batch Normalization · 1x1 Convolution · Squeeze-and-Excitation Block