Towards Scalable and Stable Parallelization of Nonlinear RNNs
Xavier Gonzalez, Andrew Warrington, Jimmy T.H. Smith, Scott W., Linderman

TL;DR
This paper introduces novel quasi-Newton and stabilization techniques to enable scalable, stable parallel evaluation of nonlinear RNNs, overcoming previous computational and numerical limitations.
Contribution
It proposes quasi-Newton approximations and a Kalman smoothing-based stabilization method to improve the efficiency and stability of parallel nonlinear RNN evaluation.
Findings
Quasi-Newton methods converge similarly to Newton's method with less memory.
The ELK method stabilizes Newton's method using Kalman smoothing.
Experiments show improved scalability and stability in nonlinear RNN evaluation.
Abstract
Transformers and linear state space models can be evaluated in parallel on modern hardware, but evaluating nonlinear RNNs appears to be an inherently sequential problem. Recently, however, Lim et al. '24 developed an approach called DEER, which evaluates nonlinear RNNs in parallel by posing the states as the solution to a fixed-point problem. They derived a parallel form of Newton's method to solve the fixed-point problem and achieved significant speedups over sequential evaluation. However, the computational complexity of DEER is cubic in the state size, and the algorithm can suffer from numerical instability. We address these limitations with two novel contributions. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably to Newton, use less memory, and are faster. To stabilize DEER, we leverage a connection between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsQuantum-Dot Cellular Automata · Neural Networks and Applications
