Stabilizing RNN Gradients through Pre-training

Luca Herranz-Celotti; Jean Rouat

arXiv:2308.12075·cs.LG·January 8, 2024

Stabilizing RNN Gradients through Pre-training

Luca Herranz-Celotti, Jean Rouat

PDF

Open Access

TL;DR

This paper introduces a pre-training method to stabilize gradients in complex RNNs by extending stability theories and adjusting gradient contributions, leading to improved training outcomes.

Contribution

It extends stability theory to complex recurrent networks and proposes a novel pre-training approach that mitigates exponential gradient explosion.

Findings

01

Pre-training to local stability improves network performance.

02

Classical initializations satisfy the Local Stability Condition in feed-forward networks.

03

Adjusting gradient weighting reduces exponential explosion in deep recurrent networks.

Abstract

Numerous theories of learning propose to prevent the gradient from exponential growth with depth or time, to stabilize and improve training. Typically, these analyses are conducted on feed-forward fully-connected neural networks or simple single-layer recurrent neural networks, given their mathematical tractability. In contrast, this study demonstrates that pre-training the network to local stability can be effective whenever the architectures are too complex for an analytical initialization. Furthermore, we extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution, a theory we call the Local Stability Condition (LSC). Our investigation reveals that the classical Glorot, He, and Orthogonal initialization schemes satisfy the LSC when applied to feed-forward fully-connected neural networks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Machine Learning in Materials Science