Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu

TL;DR
This paper reveals a deep theoretical connection between Transformers and state-space models, introducing a new, faster architecture that maintains competitive performance in language modeling.
Contribution
It develops a unifying framework linking SSMs and attention mechanisms, and proposes Mamba-2, a more efficient model based on structured state space duality.
Findings
Mamba-2 is 2-8X faster than Mamba.
Mamba-2 remains competitive with Transformers in language modeling.
Theoretical connections unify SSMs and attention mechanisms.
Abstract
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/mamba2-8b-3t-4kmodel· ♡ 21♡ 21
- 🤗nvidia/mamba2-hybrid-8b-3t-128kmodel· ♡ 44♡ 44
- 🤗nvidia/mamba2-hybrid-8b-3t-32kmodel· ♡ 5♡ 5
- 🤗nvidia/mamba2-hybrid-8b-3t-4kmodel· ♡ 74♡ 74
- 🤗nvidia/gpt3-8b-multi-3.5t-basemodel· ♡ 8♡ 8
- 🤗AntonV/mamba2-130m-avmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗AntonV/mamba2-370m-avmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗AntonV/mamba2-780m-avmodel· 2 dl2 dl
- 🤗AntonV/mamba2-1.3b-avmodel· 2 dl2 dl
- 🤗AntonV/mamba2-2.7b-avmodel· ♡ 1♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
