EMOv2: Pushing 5M Vision Model Frontier

Jiangning Zhang; Teng Hu; Haoyang He; Zhucun Xue; Yabiao Wang,; Chengjie Wang; Yong Liu; Xiangtai Li; Dacheng Tao

arXiv:2412.06674·cs.CV·December 10, 2024

EMOv2: Pushing 5M Vision Model Frontier

Jiangning Zhang, Teng Hu, Haoyang He, Zhucun Xue, Yabiao Wang,, Chengjie Wang, Yong Liu, Xiangtai Li, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces EMOv2, a lightweight vision model with 5 million parameters that achieves state-of-the-art performance across various tasks by extending CNN and Transformer architectures with a unified, efficient design.

Contribution

The work develops a novel lightweight infrastructure combining CNN and attention mechanisms, including the improved i2RMB block, to push the performance frontier of 5M parameter models.

Findings

01

EMOv2 models outperform state-of-the-art methods on multiple vision tasks.

02

EMOv2-5M achieves 82.9% Top-1 accuracy with robust training.

03

Object detection with EMOv2-5M surpasses previous models by +2.6 mAP.

Abstract

This work focuses on developing parameter-efficient and lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Our goal is to set up the new frontier of the 5M magnitude lightweight model on various downstream tasks. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterparts have been recognized by attention-based design. Our work rethinks the lightweight infrastructure of efficient IRB and practical components in Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMBlock) for lightweight model design. Following neat but effective design criterion, we deduce a modern Improved Inverted Residual Mobile Block (i2RMB) and improve a hierarchical Efficient MOdel (EMOv2) with no elaborate complex structures. Considering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangzjn/emov2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSatellite Image Processing and Photogrammetry

MethodsAttention Is All You Need · Adam · Dropout · Feature Pyramid Network · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Depthwise Convolution · Byte Pair Encoding · Pointwise Convolution