StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos
Hamid Suleman, Syed Talal Wasim, Muzammal Naseer, Juergen Gall

TL;DR
This paper introduces StableMamba, a scalable, distillation-free large SSM architecture for images and videos that improves robustness and accuracy by interleaving Mamba and Attention mechanisms, outperforming previous models.
Contribution
It proposes a novel Mamba-Attention interleaved architecture that addresses scalability issues of large SSMs without distillation, enhancing performance and robustness.
Findings
Improves accuracy on ImageNet-1K, Kinetics-400, and Something-Something-v2 benchmarks.
Resolves scalability issues of Mamba-based architectures for vision tasks.
Increases robustness to JPEG compression artifacts.
Abstract
State-space models (SSMs), exemplified by S4, have introduced a novel context modeling method by integrating state-space techniques into deep learning. However, they struggle with global context modeling due to their data-independent matrices. The Mamba model addressed this with data-dependent variants via the S6 selective-scan algorithm, enhancing context modeling, especially for long sequences. However, Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications. This paper addresses the scalability issue of large SSMs for image classification and action recognition without requiring additional techniques like knowledge distillation. We analyze the distinct characteristics of Mamba-based and Attention-based models, proposing a Mamba-Attention interleaved architecture that enhances scalability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom lasers and scattering media · Nanofabrication and Lithography Techniques · Photonic Crystals and Applications
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
