Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta, Albert Gu, Jonathan Berant

TL;DR
This paper demonstrates that diagonal state space models can match the performance of structured state space models on long-range sequence tasks, offering a simpler and more straightforward approach without sacrificing accuracy.
Contribution
The authors show that diagonal state space models perform as well as structured models like S4 on long-range tasks, simplifying the architecture while maintaining effectiveness.
Findings
Diagonal models match S4 performance on Long Range Arena tasks
Diagonal models perform well on speech classification tasks
Simpler implementation without loss of accuracy
Abstract
Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗togethercomputer/evo-1-131k-basemodel· 5.3k dl· ♡ 1145.3k dl♡ 114
- 🤗togethercomputer/evo-1-8k-basemodel· 3.1k dl· ♡ 103.1k dl♡ 10
- 🤗andrewrreed/evo-1-131k-basemodel· 4 dl4 dl
- 🤗Rocketknight1/evo-1k-testmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗LongSafari/evo-1-8k-crisprmodel· 44 dl· ♡ 244 dl♡ 2
- 🤗LongSafari/evo-1-8k-transposonmodel· 40 dl· ♡ 140 dl♡ 1
Videos
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)
