Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models
Emadeldeen Hamdan, Hongyi Pan, and Ahmet Enis Cetin

TL;DR
This paper enhances structured state space models for NLP by integrating controllability, observability, and stability into Mamba architectures, leading to improved performance, reduced parameters, and computational efficiency.
Contribution
It introduces controllability, observability, and stability into Mamba SSMs, reducing parameters and improving perplexity and training efficiency for NLP tasks.
Findings
Performs 5% better in perplexity
Reduces training time by 3%
Uses sparse, controllable, and stable A matrices
Abstract
Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the , , , and matrices at each time step, leading to increased complexity and computational costs. Furthermore, the matrix in Mamba2 is not always stable. We demonstrate a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
