What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

TL;DR
This paper analyzes the S4 model for long sequence modeling, identifying key principles behind its success, and proposes a simpler, more efficient convolutional model called SGConv that outperforms S4 in speed and effectiveness.
Contribution
The paper demystifies S4 by extracting core principles and introduces SGConv, a simpler global convolutional model that achieves comparable or better performance with higher efficiency.
Findings
SGConv outperforms S4 on Long Range Arena and Speech Command datasets.
SGConv improves efficiency and performance when integrated into language and vision models.
The study clarifies the principles behind effective global convolutional models for long sequences.
Abstract
Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes the computational complexity quadratic to the sequence length. Recently, Gu et al. [2021] proposed a model called S4 inspired by the state space model. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophisticated parameterization and initialization schemes. As a result, S4 is less intuitive and hard to use. Here we aim to demystify S4 and extract basic principles that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗togethercomputer/evo-1-131k-basemodel· 5.3k dl· ♡ 1145.3k dl♡ 114
- 🤗togethercomputer/evo-1-8k-basemodel· 3.1k dl· ♡ 103.1k dl♡ 10
- 🤗andrewrreed/evo-1-131k-basemodel· 4 dl4 dl
- 🤗Rocketknight1/evo-1k-testmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗LongSafari/evo-1-8k-crisprmodel· 44 dl· ♡ 244 dl♡ 2
- 🤗LongSafari/evo-1-8k-transposonmodel· 40 dl· ♡ 140 dl♡ 1
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsConvolution
