Stabilizing RNN Gradients through Pre-training
Luca Herranz-Celotti, Jean Rouat

TL;DR
This paper introduces a pre-training method to stabilize gradients in complex RNNs by extending stability theories and adjusting gradient contributions, leading to improved training outcomes.
Contribution
It extends stability theory to complex recurrent networks and proposes a novel pre-training approach that mitigates exponential gradient explosion.
Findings
Pre-training to local stability improves network performance.
Classical initializations satisfy the Local Stability Condition in feed-forward networks.
Adjusting gradient weighting reduces exponential explosion in deep recurrent networks.
Abstract
Numerous theories of learning propose to prevent the gradient from exponential growth with depth or time, to stabilize and improve training. Typically, these analyses are conducted on feed-forward fully-connected neural networks or simple single-layer recurrent neural networks, given their mathematical tractability. In contrast, this study demonstrates that pre-training the network to local stability can be effective whenever the architectures are too complex for an analytical initialization. Furthermore, we extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution, a theory we call the Local Stability Condition (LSC). Our investigation reveals that the classical Glorot, He, and Orthogonal initialization schemes satisfy the LSC when applied to feed-forward fully-connected neural networks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Machine Learning in Materials Science
