How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
Mariia Seleznova

TL;DR
This paper analyzes how the accuracy of infinite-width approximations in linear recurrent models deteriorates with increasing depth and width, identifying critical scaling regimes where finite-width effects become significant.
Contribution
It derives exact finite-width formulas for signal energies and characterizes the joint depth-width regimes affecting signal propagation in linear recurrences.
Findings
Infinite-width approximation valid for t=o(√n)
Deviations appear at t∼c√n, leading to a nontrivial limit
Finite-width effects dominate when t≫√n, causing instability
Abstract
We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth grows jointly with width . This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large . We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime , in which the infinite-width approximation remains valid; (ii) a critical regime , in which non-negligible deviations from infinite-width predictions appear and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
