EMOv2: Pushing 5M Vision Model Frontier
Jiangning Zhang, Teng Hu, Haoyang He, Zhucun Xue, Yabiao Wang,, Chengjie Wang, Yong Liu, Xiangtai Li, Dacheng Tao

TL;DR
This paper introduces EMOv2, a lightweight vision model with 5 million parameters that achieves state-of-the-art performance across various tasks by extending CNN and Transformer architectures with a unified, efficient design.
Contribution
The work develops a novel lightweight infrastructure combining CNN and attention mechanisms, including the improved i2RMB block, to push the performance frontier of 5M parameter models.
Findings
EMOv2 models outperform state-of-the-art methods on multiple vision tasks.
EMOv2-5M achieves 82.9% Top-1 accuracy with robust training.
Object detection with EMOv2-5M surpasses previous models by +2.6 mAP.
Abstract
This work focuses on developing parameter-efficient and lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Our goal is to set up the new frontier of the 5M magnitude lightweight model on various downstream tasks. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterparts have been recognized by attention-based design. Our work rethinks the lightweight infrastructure of efficient IRB and practical components in Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMBlock) for lightweight model design. Following neat but effective design criterion, we deduce a modern Improved Inverted Residual Mobile Block (i2RMB) and improve a hierarchical Efficient MOdel (EMOv2) with no elaborate complex structures. Considering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry
MethodsAttention Is All You Need · Adam · Dropout · Feature Pyramid Network · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Depthwise Convolution · Byte Pair Encoding · Pointwise Convolution
