The Illusion of State in State-Space Models
William Merrill, Jackson Petty, Ashish Sabharwal

TL;DR
This paper demonstrates that state-space models (SSMs) do not have a significant expressive advantage over transformers in state tracking, limiting their effectiveness for complex sequential tasks.
Contribution
The paper provides a formal analysis showing SSMs are limited to the complexity class TC^0, similar to transformers, and reports experimental evidence of their struggles with state tracking.
Findings
SSMs cannot express computation outside TC^0
SSMs struggle with state tracking in experiments
SSMs have similar limitations to transformers in sequential tasks
Abstract
State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class . In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning
