Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Dar Gilboa; Bo Chang; Minmin Chen; Greg Yang; Samuel S. Schoenholz; Ed; H. Chi; Jeffrey Pennington

arXiv:1901.08987·cs.LG·May 27, 2019·26 cites

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed, H. Chi, Jeffrey Pennington

PDF

Open Access

TL;DR

This paper develops a mean field theory for LSTMs and GRUs to understand signal propagation, leading to a new initialization scheme that improves training stability and performance on long sequence tasks.

Contribution

The authors introduce a mean field theory for LSTMs and GRUs, deriving an initialization scheme that reduces training instabilities and enhances generalization.

Findings

01

New initialization scheme improves training stability on long sequences

02

Scheme enables successful training where standard methods fail

03

Observed better generalization with the proposed initialization

Abstract

Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trained on very long sequences. In this work, we develop a mean field theory of signal propagation in LSTMs and GRUs that enables us to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians. By optimizing these quantities in terms of the initialization hyperparameters, we derive a novel initialization scheme that eliminates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Model Reduction and Neural Networks

MethodsSigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory