Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, Christopher R\'e

TL;DR
The paper introduces S4, a structured state space model that efficiently captures long-range dependencies in sequence data, outperforming prior models in speed and accuracy across various benchmarks.
Contribution
Proposes the S4 model with a novel low-rank parameterization of the state matrix, enabling efficient computation and strong performance on long sequence tasks.
Findings
Achieves 91% accuracy on sequential CIFAR-10 without data augmentation.
Closes the gap to Transformers on image and language modeling tasks, with 60x faster generation.
Sets state-of-the-art results on the Long Range Arena benchmark, including the Path-X task.
Abstract
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) \( x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) \), and showed that for appropriate choices of the state matrix \( A \), this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsResidual Block · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Average Pooling · Global Average Pooling · Batch Normalization · Residual Connection · 1x1 Convolution · Bottleneck Residual Block · Kaiming Initialization
