MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He; Jiangning Zhang; Yuxuan Cai; Hongxu Chen; Xiaobin Hu,; Zhenye Gan; Yabiao Wang; Chengjie Wang; Yunsheng Wu; Lei Xie

arXiv:2411.15941·cs.CV·November 26, 2024·3 cites

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu,, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie

PDF

Open Access 1 Repo 1 Models

TL;DR

MobileMamba is a lightweight, multi-receptive field neural network that significantly improves inference speed and accuracy for high-resolution visual tasks by integrating novel modules and strategies.

Contribution

The paper introduces MobileMamba, a three-stage network with a new MRFFI module that balances efficiency and performance in lightweight visual models.

Findings

01

Achieves up to 83.6% Top-1 accuracy.

02

Surpasses existing models in speed and accuracy.

03

Maximum 21x faster than LocalVim on GPU.

Abstract

Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios. Recently, state-space models have gained popularity in the visual domain due to their linear computational complexity. Despite their low FLOPs, current lightweight Mamba-based models exhibit suboptimal throughput. In this work, we propose the MobileMamba framework, which balances efficiency and performance. We design a three-stage network to enhance inference speed significantly. At a fine-grained level, we introduce the Multi-Receptive Field Feature Interaction(MRFFI) module, comprising the Long-Range Wavelet Transform-Enhanced Mamba(WTE-Mamba), Efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lewandofskee/MobileMamba
pytorchOfficial

Models

🤗
Lewandofski/MobileMamba
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings