Revisiting Glorot Initialization for Long-Range Linear Recurrences

Noga Bar; Mariia Seleznova; Yotam Alexander; Gitta Kutyniok; Raja Giryes

arXiv:2505.19827·cs.LG·May 27, 2025

Revisiting Glorot Initialization for Long-Range Linear Recurrences

Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes

PDF

Open Access 1 Video

TL;DR

This paper analyzes the limitations of Glorot initialization for long-range linear RNNs, demonstrating its instability over long sequences and proposing a simple rescaling method to improve stability.

Contribution

The paper reveals the instability of Glorot initialization in long sequences and introduces a dimension-aware rescaling technique to ensure stable RNN training.

Findings

01

Glorot initialization can cause exploding signals in long sequences.

02

Sequences of length proportional to the square root of hidden size induce instability.

03

A simple rescaling of Glorot initialization stabilizes long-range RNNs.

Abstract

Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation--but derived under the infinite-width, fixed-length regime--an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length $t = O (n)$ , where $n$ is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Revisiting Glorot Initialization for Long-Range Linear Recurrences· slideslive

Taxonomy

TopicsGeological Modeling and Analysis · Hydrocarbon exploration and reservoir analysis