On the Parameterization and Initialization of Diagonal State Space Models
Albert Gu, Ankit Gupta, Karan Goel, Christopher R\'e

TL;DR
This paper systematically analyzes how to effectively parameterize and initialize diagonal state space models, demonstrating that a simple diagonal version of S4, called S4D, performs comparably to the original on various long-range sequence tasks.
Contribution
It provides a detailed understanding of diagonal SSM parameterization and initialization, introducing S4D, a simple yet effective diagonal S4 variant with state-of-the-art results.
Findings
S4D matches S4 performance across multiple domains.
Initialization is critical for diagonal SSM effectiveness.
S4D kernel computation is extremely simple, requiring just 2 lines of code.
Abstract
State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗togethercomputer/evo-1-131k-basemodel· 5.3k dl· ♡ 1145.3k dl♡ 114
- 🤗togethercomputer/evo-1-8k-basemodel· 3.1k dl· ♡ 103.1k dl♡ 10
- 🤗andrewrreed/evo-1-131k-basemodel· 4 dl4 dl
- 🤗Rocketknight1/evo-1k-testmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗LongSafari/evo-1-8k-crisprmodel· 44 dl· ♡ 244 dl♡ 2
- 🤗LongSafari/evo-1-8k-transposonmodel· 40 dl· ♡ 140 dl♡ 1
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing · Neural Networks and Applications
