Sparse Mamba: Introducing Controllability, Observability, And Stability   To Structural State Space Models

Emadeldeen Hamdan; Hongyi Pan; and Ahmet Enis Cetin

arXiv:2409.00563·cs.LG·November 12, 2024

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Emadeldeen Hamdan, Hongyi Pan, and Ahmet Enis Cetin

PDF

Open Access

TL;DR

This paper enhances structured state space models for NLP by integrating controllability, observability, and stability into Mamba architectures, leading to improved performance, reduced parameters, and computational efficiency.

Contribution

It introduces controllability, observability, and stability into Mamba SSMs, reducing parameters and improving perplexity and training efficiency for NLP tasks.

Findings

01

Performs 5% better in perplexity

02

Reduces training time by 3%

03

Uses sparse, controllable, and stable A matrices

Abstract

Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the $n x n$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$ , $B$ , $C$ , and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces