GroupMamba: Efficient Group-Based Visual State Space Model

Abdelrahman Shaker; Syed Talal Wasim; Salman Khan; Juergen Gall; Fahad; Shahbaz Khan

arXiv:2407.13772·cs.CV·April 1, 2025·3 cites

GroupMamba: Efficient Group-Based Visual State Space Model

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad, Shahbaz Khan

PDF

Open Access 1 Repo

TL;DR

This paper introduces GroupMamba, a parameter-efficient, SSM-based visual model with a novel group-wise scanning architecture and distillation training, achieving state-of-the-art results across multiple vision tasks with fewer parameters.

Contribution

It proposes a modulated group mamba layer with independent directional SSM blocks and a distillation-based training method for scalable, stable, and efficient vision models.

Findings

01

Achieves 83.3% top-1 accuracy on ImageNet-1K with 23M parameters.

02

Outperforms existing methods on COCO object detection and segmentation.

03

Demonstrates superior efficiency and performance across multiple vision benchmarks.

Abstract

State-space models (SSMs) have recently shown promise in capturing long-range dependencies with subquadratic computational complexity, making them attractive for various applications. However, purely SSM-based models face critical challenges related to stability and achieving state-of-the-art performance in computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes. We introduce a parameter-efficient modulated group mamba layer that divides the input channels into four groups and applies our proposed SSM-based efficient Visual Single Selective Scanning (VSSS) block independently to each group, with each VSSS block scanning in one of the four spatial directions. The Modulated Group Mamba layer also wraps the four VSSS blocks into a channel modulation operator to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Amshaker/GroupMamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Robotics and Automated Systems · Advanced Vision and Imaging