Parallel Recursive LSTM

Tristan Gaudreault; Yongyi Mao

arXiv:2605.17108·cs.LG·May 19, 2026

Parallel Recursive LSTM

Tristan Gaudreault, Yongyi Mao

PDF

TL;DR

The paper introduces the Parallel Recursive LSTM (PR-LSTM), a hierarchical recurrent model that enhances parallelism and efficiency in sequence modeling by replacing linear recurrence with recursive state composition, outperforming traditional RNNs, LSTMs, and Transformers on formal-language tasks.

Contribution

The novel PR-LSTM architecture reorganizes recurrent computation hierarchically, reducing parallel depth and maintaining nonlinear state representations, enabling efficient long-sequence processing.

Findings

01

PR-LSTM achieves strong sequence-length generalization on formal-language benchmarks.

02

It solves more tasks than standard RNN, LSTM, and Transformer baselines.

03

PR-LSTM avoids the quadratic scaling of attention in long sequences.

Abstract

Transformers have become the dominant architecture for sequence modeling by using self-attention to enable expressive and highly parallel processing. However, the resulting quadratic time and memory costs limit efficiency in long-context settings. Recurrent models such as LSTMs provide explicit nonlinear state updates and strong state-tracking capabilities, yet their strictly sequential computation limits parallelism. We introduce the Parallel Recursive LSTM (PR-LSTM), a hierarchical recurrent architecture that replaces left-to-right recurrence with recursive nonlinear state composition over a balanced computation tree. Tokens are first mapped independently to latent states, which are then recursively merged by a learned gated composition block. This structure uses the reduction pattern underlying parallel scans as a fixed execution schedule, rather than assuming an associative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.