Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models

Aleksandar Terzi\'c; Francesco Carzaniga; Nicolas Menet; Yannick Biehl; Michael Hersche; Thomas Hofmann; Abbas Rahimi

arXiv:2605.19150·cs.LG·May 20, 2026

Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models

Aleksandar Terzi\'c, Francesco Carzaniga, Nicolas Menet, Yannick Biehl, Michael Hersche, Thomas Hofmann, Abbas Rahimi

PDF

TL;DR

Flash PD-SSM introduces a memory-efficient structured sparse state-space model that achieves high expressivity and state-of-the-art accuracy in time-series and language modeling tasks.

Contribution

It proposes a novel structured sparse SSM with discrete selection of matrices, balancing expressivity and efficiency for large-scale training.

Findings

01

Achieves comparable throughput to existing structured SSMs with better expressivity.

02

Sets new state-of-the-art accuracy on long sequence time-series tasks.

03

Improves language modeling performance and efficiency when used as a drop-in replacement.

Abstract

State-space models (SSMs) face a fundamental trade-off between efficiency and expressivity that is mainly dictated by the structure of the model's transition matrix. Unstructured transition matrices enable maximal expressivity, as measured by their ability to model finite-state automaton (FSA) transitions, but come at a prohibitively high compute and memory cost. In contrast, most structured transition matrix forms are highly efficient both in runtime and memory consumption, but suffer from limited expressivity. Building on recent work on structured sparse SSMs, we propose Flash PD-SSM, a novel SSM that achieves comparable throughput to widely-used structured SSMs with significantly better expressivity guarantees. Flash PD-SSM maintains a trainable set of structured sparse matrices, a single one of which is discretely selected at each time-step, enabling FSA expressiveness at the level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.