Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar, Gulcehre, Razvan Pascanu, Soham De

TL;DR
This paper demonstrates that with careful design, deep RNNs can match the performance and efficiency of deep state-space models on long sequence tasks, providing new insights into their relative strengths.
Contribution
The authors show that standard RNNs can be significantly improved to match state-space models by specific design choices, introducing the Linear Recurrent Unit for better long-range reasoning.
Findings
Deep RNNs can match SSM performance with proper design.
Linear Recurrent Unit achieves state-of-the-art results on Long Range Arena.
Design improvements enable RNNs to train faster and perform better on long sequences.
Abstract
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. In this paper, we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗togethercomputer/evo-1-131k-basemodel· 5.3k dl· ♡ 1145.3k dl♡ 114
- 🤗togethercomputer/evo-1-8k-basemodel· 3.1k dl· ♡ 103.1k dl♡ 10
- 🤗andrewrreed/evo-1-131k-basemodel· 4 dl4 dl
- 🤗Rocketknight1/evo-1k-testmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗LongSafari/evo-1-8k-crisprmodel· 44 dl· ♡ 244 dl♡ 2
- 🤗LongSafari/evo-1-8k-transposonmodel· 40 dl· ♡ 140 dl♡ 1
Videos
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Machine Learning in Materials Science
