Almost Sure Asymptotic Freeness of Neural Network Jacobian with Orthogonal Weights
Tomohiro Hayase

TL;DR
This paper proves that in wide neural networks with orthogonal Haar-distributed weights, the Jacobians become asymptotically free, aiding understanding of gradient behavior and training stability.
Contribution
It establishes the almost sure asymptotic freeness of layer-wise Jacobians in deep neural networks with orthogonal weight initialization, advancing free probability theory applications.
Findings
Jacobians become asymptotically free in the wide limit
Orthogonal Haar-distributed weights lead to well-conditioned Jacobian spectra
Results help improve understanding of gradient stability in deep networks
Abstract
A well-conditioned Jacobian spectrum has a vital role in preventing exploding or vanishing gradients and speeding up learning of deep neural networks. Free probability theory helps us to understand and handle the Jacobian spectrum. We rigorously show almost sure asymptotic freeness of layer-wise Jacobians of deep neural networks as the wide limit. In particular, we treat the case that weights are initialized as Haar distributed orthogonal matrices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Markov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques
