Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models
Aleksandar Terzi\'c, Francesco Carzaniga, Nicolas Menet, Yannick Biehl, Michael Hersche, Thomas Hofmann, Abbas Rahimi

TL;DR
Flash PD-SSM introduces a memory-efficient structured sparse state-space model that achieves high expressivity and state-of-the-art accuracy in time-series and language modeling tasks.
Contribution
It proposes a novel structured sparse SSM with discrete selection of matrices, balancing expressivity and efficiency for large-scale training.
Findings
Achieves comparable throughput to existing structured SSMs with better expressivity.
Sets new state-of-the-art accuracy on long sequence time-series tasks.
Improves language modeling performance and efficiency when used as a drop-in replacement.
Abstract
State-space models (SSMs) face a fundamental trade-off between efficiency and expressivity that is mainly dictated by the structure of the model's transition matrix. Unstructured transition matrices enable maximal expressivity, as measured by their ability to model finite-state automaton (FSA) transitions, but come at a prohibitively high compute and memory cost. In contrast, most structured transition matrix forms are highly efficient both in runtime and memory consumption, but suffer from limited expressivity. Building on recent work on structured sparse SSMs, we propose Flash PD-SSM, a novel SSM that achieves comparable throughput to widely-used structured SSMs with significantly better expressivity guarantees. Flash PD-SSM maintains a trainable set of structured sparse matrices, a single one of which is discretely selected at each time-step, enabling FSA expressiveness at the level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
