GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad, Shahbaz Khan

TL;DR
This paper introduces GroupMamba, a parameter-efficient, SSM-based visual model with a novel group-wise scanning architecture and distillation training, achieving state-of-the-art results across multiple vision tasks with fewer parameters.
Contribution
It proposes a modulated group mamba layer with independent directional SSM blocks and a distillation-based training method for scalable, stable, and efficient vision models.
Findings
Achieves 83.3% top-1 accuracy on ImageNet-1K with 23M parameters.
Outperforms existing methods on COCO object detection and segmentation.
Demonstrates superior efficiency and performance across multiple vision benchmarks.
Abstract
State-space models (SSMs) have recently shown promise in capturing long-range dependencies with subquadratic computational complexity, making them attractive for various applications. However, purely SSM-based models face critical challenges related to stability and achieving state-of-the-art performance in computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes. We introduce a parameter-efficient modulated group mamba layer that divides the input channels into four groups and applies our proposed SSM-based efficient Visual Single Selective Scanning (VSSS) block independently to each group, with each VSSS block scanning in one of the four spatial directions. The Modulated Group Mamba layer also wraps the four VSSS blocks into a channel modulation operator to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Robotics and Automated Systems · Advanced Vision and Imaging
