MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba
Zhongping Ji

TL;DR
This paper introduces MHS-VM, a novel multi-head scanning module for vision models that improves feature organization in 2D images, enhancing performance and reducing parameters in VM-UNet.
Contribution
The paper proposes a multi-head scan module with scan route attention for better 2D feature construction in vision models, replacing the 2D-Selective-Scan block in VM-UNet.
Findings
Significant performance improvements on visual tasks.
Reduced model parameters compared to original VM-UNet.
Effective organization of 2D features via multi-head scanning.
Abstract
Recently, State Space Models (SSMs), with Mamba as a prime example, have shown great promise for long-range dependency modeling with linear complexity. Then, Vision Mamba and the subsequent architectures are presented successively, and they perform well on visual tasks. The crucial step of applying Mamba to visual tasks is to construct 2D visual features in sequential manners. To effectively organize and construct visual features within the 2D image space through 1D selective scan, we propose a novel Multi-Head Scan (MHS) module. The embeddings extracted from the preceding layer are projected into multiple lower-dimensional subspaces. Subsequently, within each subspace, the selective scan is performed along distinct scan routes. The resulting sub-embeddings, obtained from the multi-head scan process, are then integrated and ultimately projected back into the high-dimensional space.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
