Mamba-Adaptor: State Space Model Adaptor for Visual Recognition

Fei Xie; Jiahao Nie; Yujin Tang; Wenkang Zhang; Hongshen Zhao

arXiv:2505.12685·cs.CV·May 20, 2025

Mamba-Adaptor: State Space Model Adaptor for Visual Recognition

Fei Xie, Jiahao Nie, Yujin Tang, Wenkang Zhang, Hongshen Zhao

PDF

Open Access

TL;DR

Mamba-Adaptor introduces a novel vision task adaptor for State Space Models, enhancing global context access, spatial modeling, and long-range memory, leading to state-of-the-art results in visual recognition tasks.

Contribution

The paper proposes a lightweight vision adaptor with Adaptor-T and Adaptor-S modules to improve Mamba's performance in visual tasks, addressing key limitations.

Findings

01

Achieves state-of-the-art results on ImageNet and COCO benchmarks.

02

Enhances global context access and spatial modeling in Mamba.

03

Effective as a general backbone, booster, and fine-tuning module.

Abstract

Recent State Space Models (SSM), especially Mamba, have demonstrated impressive performance in visual modeling and possess superior model efficiency. However, the application of Mamba to visual tasks suffers inferior performance due to three main constraints existing in the sequential model: 1) Casual computing is incapable of accessing global context; 2) Long-range forgetting when computing the current hidden states; 3) Weak spatial structural modeling due to the transformed sequential input. To address these issues, we investigate a simple yet powerful vision task Adaptor for Mamba models, which consists of two functional modules: Adaptor-T and Adaptor-S. When solving the hidden states for SSM, we apply a lightweight prediction module Adaptor-T to select a set of learnable locations as memory augmentations to ease long-range forgetting issues. Moreover, we leverage Adapator-S,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsSparse Evolutionary Training · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Balanced Selection