The Illusion of State in State-Space Models

William Merrill; Jackson Petty; Ashish Sabharwal

arXiv:2404.08819·cs.LG·March 7, 2025·1 cites

The Illusion of State in State-Space Models

William Merrill, Jackson Petty, Ashish Sabharwal

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper demonstrates that state-space models (SSMs) do not have a significant expressive advantage over transformers in state tracking, limiting their effectiveness for complex sequential tasks.

Contribution

The paper provides a formal analysis showing SSMs are limited to the complexity class TC^0, similar to transformers, and reports experimental evidence of their struggles with state tracking.

Findings

01

SSMs cannot express computation outside TC^0

02

SSMs struggle with state tracking in experiments

03

SSMs have similar limitations to transformers in sequential tasks

Abstract

State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $TC^{0}$ . In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

benjamin-walker/selective-ssms-and-linear-cdes
jax

Models

🤗
BeeGass/Group-Theory-Collection
model

Datasets

BeeGass/Group-Theory-Collection
dataset· 1.4k dl
1.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning