Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes

TL;DR
This paper analyzes the limitations of Glorot initialization for long-range linear RNNs, demonstrating its instability over long sequences and proposing a simple rescaling method to improve stability.
Contribution
The paper reveals the instability of Glorot initialization in long sequences and introduces a dimension-aware rescaling technique to ensure stable RNN training.
Findings
Glorot initialization can cause exploding signals in long sequences.
Sequences of length proportional to the square root of hidden size induce instability.
A simple rescaling of Glorot initialization stabilizes long-range RNNs.
Abstract
Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation--but derived under the infinite-width, fixed-length regime--an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length , where is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGeological Modeling and Analysis · Hydrocarbon exploration and reservoir analysis
