The Expressive Limits of Diagonal SSMs for State-Tracking

Mehran Shakerinava; Behnoush Khavari; Siamak Ravanbakhsh; Sarath Chandar

arXiv:2603.01959·cs.LG·March 3, 2026

The Expressive Limits of Diagonal SSMs for State-Tracking

Mehran Shakerinava, Behnoush Khavari, Siamak Ravanbakhsh, Sarath Chandar

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the theoretical expressive limits of diagonal state-space models (SSMs) for sequence tasks, revealing their inability to represent certain group structures and highlighting a gap between their theoretical capacity and practical learnability.

Contribution

It provides a rigorous group-theoretic analysis of the expressivity of multi-layer diagonal SSMs, identifying their limitations in representing non-Abelian group state-tracking.

Findings

01

Single-layer DCD SSMs cannot express non-Abelian group state-tracking.

02

Multi-layer DCD SSMs can only express groups with a subnormal series of Abelian factors.

03

Empirically, multi-layer models often fail to learn non-Abelian group state-tracking.

Abstract

State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling tasks while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) SSMs on sequential state-tracking tasks. We show that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision. More generally, we show that $k$ -layer DCD SSMs can express state-tracking of a group if and only if that group has a subnormal series of length $k$ , with Abelian factors. That is, we identify the precise expressivity range of $k$ -layer DCD SSMs within the solvable groups. Empirically, we find that multi-layer models often fail to learn state-tracking for non-Abelian groups,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The authors provide clear iff theorems and link depth with group structure: i.e., 1-layer - Abelian, and k-layer - length <= k. The paper also provides a good demonstration of two-layer S_3 that illustrates the theory, and presents an interesting observation that expressivity does not directly lead to learnability in reality.

Weaknesses

- It is uncertain if the observations in this paper will directly lead to same results in real-world benchmarks such as language modeling. - the experiments in the paper are limited, not providing results with different state dimensions, precisions, and decoders. - it is uncertain how the training details in the paper, and unclear if it is actually true that expressivity != learnability.

Reviewer 02Rating 4Confidence 2

Strengths

The paper improves the understanding of the expressivity of State Space Models (SSM), by focusing on a specific type of SSM and a particular data type (abelian vs non-abelian groups). This seems to be a fresh perspective on analyzing expressivity and the theory appears to be rigorous.

Weaknesses

One concern is on the significance of the result. Does this make the SSM architecture more expressive (or less) than a transformer? Does this have any practical implications on how we should train SSMs? Another point that makes it harder to understand the significance of the theory is that the experimental results do not directly support the theory but instead suggest that the finding that even if the models theoretically can learn certain tasks, the optimization fails to do it. This is not in

Reviewer 03Rating 4Confidence 2

Strengths

1. The paper provides necessary and sufficient conditions for when diagonal SSMs can track groups, not just sufficient conditions or impossibility results 2. The connection to group theory seems an elegant way to relate architectural constraints to algebraic properties. 3. The paper doesn't just prove existence - it shows explicit constructions demonstrating how multi-layer diagonal SSMs can track non-Abelian groups. 4. The experimental section reveals an important gap between expressivity and l

Weaknesses

1. The experiments only test on 5 groups and don't explore what makes some solvable groups learnable vs others. More extensive experiments would strengthen claims about the learnability gap. 2. While the paper identifies that multi-layer models fail to learn non-Abelian groups, it doesn't deeply investigate why or propose solutions beyond noting "optimization difficulties" 3. State-tracking is a specific synthetic task family. The paper doesn't clearly connect these limitations to practical sequ

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Time Series Analysis and Forecasting · Machine Learning and Algorithms