The Computational Limits of State-Space Models and Mamba via the Lens of   Circuit Complexity

Yifang Chen; Xiaoyu Li; Yingyu Liang; Zhenmei Shi; Zhao Song

arXiv:2412.06148·cs.CC·February 21, 2025

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

PDF

Open Access

TL;DR

This paper uses circuit complexity to show that Mamba and State-space Models have the same computational limits as Transformers, challenging assumptions about their greater expressiveness.

Contribution

It provides rigorous proofs that Mamba and SSMs are within the $ ext{TC}^0$ complexity class, limiting their computational power compared to more complex models.

Findings

01

Mamba and SSMs are in $ ext{TC}^0$ complexity class.

02

They cannot solve problems outside $ ext{TC}^0$, such as certain formula evaluation problems.

03

Mamba's computational capabilities are equivalent to Transformers.

Abstract

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and recent attention as a strong candidate to outperform Transformers, we have demonstrated that both Mamba and SSMs with $poly (n)$ -precision and constant-depth layers reside within the $DLOGTIME$ -uniform $TC^{0}$ complexity class. This result indicates Mamba has the same computational capabilities as Transformer theoretically, and it cannot solve problems like arithmetic formula problems, boolean formula value problems, and permutation composition problems if $TC^{0} \neq = NC^{1}$ . Therefore, it challenges the assumption Mamba is more computationally expressive than Transformers. Our contributions include rigorous proofs showing that Selective SSM and Mamba architectures can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCellular Automata and Applications

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing