Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu, Hongbo Li

TL;DR
Mamba YOLO introduces a simple, efficient baseline for real-time object detection using a State Space Model to reduce complexity and enhance performance, achieving state-of-the-art results on COCO with fast inference.
Contribution
The paper presents ODMamba, a novel backbone with linear complexity SSM, and a multi-branch RG Block, enabling effective, real-time object detection without pretraining.
Findings
Achieves 7.5% mAP improvement on COCO benchmark.
Runs inference at 1.5 ms on a single 4090 GPU.
Outperforms previous methods in real-time object detection.
Abstract
Driven by the rapid development of deep learning technology, the YOLO series has set a new benchmark for real-time object detectors. Additionally, transformer-based structures have emerged as the most powerful solution in the field, greatly extending the model's receptive field and achieving significant performance improvements. However, this improvement comes at a cost as the quadratic complexity of the self-attentive mechanism increases the computational burden of the model. To address this problem, we introduce a simple yet effective baseline approach called Mamba YOLO. Our contributions are as follows: 1) We propose that the ODMamba backbone introduce a \textbf{S}tate \textbf{S}pace \textbf{M}odel (\textbf{SSM}) with linear complexity to address the quadratic complexity of self-attention. Unlike the other Transformer-base and SSM-base method, ODMamba is simple to train without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Science and Engineering
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Sparse Evolutionary Training
