Criticality and Saturation in Orthogonal Neural Networks
Max Guillen, Jan E. Gerken

TL;DR
This paper provides a theoretical framework explaining why orthogonal initializations stabilize neural network tensors at finite widths, enhancing understanding of their training performance and depth stability.
Contribution
It derives explicit recursion relations for finite-width tensor statistics in orthogonal networks and extends Feynman diagram methods to all orders, explaining their stability.
Findings
Recursion relations reproduce observed tensor stability in orthogonal networks.
Theoretical results match Monte Carlo simulations and large-depth expansions.
Orthogonal initialization improves stability and training performance.
Abstract
It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statistics are supplemented by a power series in . In particular, recent empirical results by Day et al. show that the tensors appearing in this treatment stabilize for large depth, as opposed to the tensors of i.i.d.-initialized networks. In this article, we derive explicit layer-wise recursion relations for the tensors appearing in the finite-width expansion of the network statistics in the case of orthogonal initializations. We also provide an extension of recently-introduced Feynman diagrams for the corresponding recursions in the i.i.d.-case which are valid to all orders in . Finally, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
