Towards Scalable and Stable Parallelization of Nonlinear RNNs

Xavier Gonzalez; Andrew Warrington; Jimmy T.H. Smith; Scott W.; Linderman

arXiv:2407.19115·cs.LG·January 17, 2025·1 cites

Towards Scalable and Stable Parallelization of Nonlinear RNNs

Xavier Gonzalez, Andrew Warrington, Jimmy T.H. Smith, Scott W., Linderman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces novel quasi-Newton and stabilization techniques to enable scalable, stable parallel evaluation of nonlinear RNNs, overcoming previous computational and numerical limitations.

Contribution

It proposes quasi-Newton approximations and a Kalman smoothing-based stabilization method to improve the efficiency and stability of parallel nonlinear RNN evaluation.

Findings

01

Quasi-Newton methods converge similarly to Newton's method with less memory.

02

The ELK method stabilizes Newton's method using Kalman smoothing.

03

Experiments show improved scalability and stability in nonlinear RNN evaluation.

Abstract

Transformers and linear state space models can be evaluated in parallel on modern hardware, but evaluating nonlinear RNNs appears to be an inherently sequential problem. Recently, however, Lim et al. '24 developed an approach called DEER, which evaluates nonlinear RNNs in parallel by posing the states as the solution to a fixed-point problem. They derived a parallel form of Newton's method to solve the fixed-point problem and achieved significant speedups over sequential evaluation. However, the computational complexity of DEER is cubic in the state size, and the algorithm can suffer from numerical instability. We address these limitations with two novel contributions. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably to Newton, use less memory, and are faster. To stabilize DEER, we leverage a connection between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lindermanlab/elk
jaxOfficial

Videos

Towards Scalable and Stable Parallelization of Nonlinear RNNs· slideslive

Taxonomy

TopicsQuantum-Dot Cellular Automata · Neural Networks and Applications