Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary   Detection

Haoxuan Wang; Qingdong He; Jinlong Peng; Hao Yang; Mingmin Chi; Yabiao; Wang

arXiv:2409.08513·cs.CV·September 19, 2024

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Haoxuan Wang, Qingdong He, Jinlong Peng, Hao Yang, Mingmin Chi, Yabiao, Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

Mamba-YOLO-World introduces a novel feature fusion mechanism that enhances open-vocabulary object detection performance while maintaining efficiency, outperforming previous models on COCO and LVIS benchmarks.

Contribution

The paper proposes MambaFusion-PAN, a new feature fusion architecture with linear complexity, improving YOLO-based open-vocabulary detection.

Findings

01

Outperforms YOLO-World on COCO and LVIS in zero-shot and fine-tuning.

02

Achieves better accuracy with fewer parameters and FLOPs.

03

Surpasses existing state-of-the-art OVD methods.

Abstract

Open-vocabulary detection (OVD) aims to detect objects beyond a predefined set of categories. As a pioneering model incorporating the YOLO series into OVD, YOLO-World is well-suited for scenarios prioritizing speed and efficiency. However, its performance is hindered by its neck feature fusion mechanism, which causes the quadratic complexity and the limited guided receptive fields. To address these limitations, we present Mamba-YOLO-World, a novel YOLO-based OVD model employing the proposed MambaFusion Path Aggregation Network (MambaFusion-PAN) as its neck architecture. Specifically, we introduce an innovative State Space Model-based feature fusion mechanism consisting of a Parallel-Guided Selective Scan algorithm and a Serial-Guided Selective Scan algorithm with linear complexity and globally guided receptive fields. It leverages multi-modal input sequences and mamba hidden states to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xuan-World/Mamba-YOLO-World
pytorchOfficial

Models

🤗
Xuan-World/Mamba-YOLO-World
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings