Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Yuheng Shi; Minjing Dong; Chang Xu

arXiv:2405.14174·cs.CV·May 24, 2024·22 cites

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Yuheng Shi, Minjing Dong, Chang Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

Multi-Scale VMamba introduces a hierarchical vision model that combines multi-scale 2D scanning and convolutional feed-forward networks to improve efficiency and performance in vision tasks, outperforming existing models on benchmarks.

Contribution

It proposes a novel hierarchical vision model with multi-scale 2D scanning and ConvFFN, enhancing long-range dependency learning while reducing computational costs.

Findings

01

Achieves 82.8% top-1 accuracy on ImageNet with MSVMamba-Tiny.

02

Outperforms existing models on COCO detection and segmentation tasks.

03

Demonstrates competitive results on ADE20K segmentation.

Abstract

Despite the significant achievements of Vision Transformers (ViTs) in various vision tasks, they are constrained by the quadratic complexity. Recently, State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity with respect to the input length, demonstrating substantial potential across fields including natural language processing and computer vision. To improve the performance of SSMs in vision tasks, a multi-scan strategy is widely adopted, which leads to significant redundancy of SSMs. For a better trade-off between efficiency and performance, we analyze the underlying reasons behind the success of the multi-scan strategy, where long-range dependency plays an important role. Based on the analysis, we introduce Multi-Scale Vision Mamba (MSVMamba) to preserve the superiority of SSMs in vision tasks with limited parameters. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhengsss/msvmamba
pytorchOfficial

Videos

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model· slideslive

Taxonomy

TopicsImage Retrieval and Classification Techniques · Data Visualization and Analytics

MethodsRegion Proposal Network · Convolution · Softmax · RoIAlign · Mask R-CNN