Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

TL;DR
This paper provides a comprehensive survey of Mamba, a neural network architecture challenging Transformers, discussing its principles, improvements, integrations, and mathematical interpretations within the NLP field.
Contribution
It offers an extensive overview of Mamba's development, its foundational principles, and its potential to replace or complement Transformers in neural network architectures.
Findings
Mamba is based on structured state space models.
Mamba can potentially substitute Transformers.
Combining Transformers and Mamba mitigates their individual limitations.
Abstract
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGlobal Maritime and Colonial Histories
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
