Mean Field Residual Networks: On the Edge of Chaos
Greg Yang, Samuel S. Schoenholz

TL;DR
This paper analyzes residual networks using mean field theory, revealing how skip connections lead to subexponential and polynomial dynamics that preserve input geometry and gradient flow, and predicts optimal initializations for training.
Contribution
It introduces a theoretical framework for understanding residual networks' dynamics, deriving polynomial exponents, and predicting optimal initializations based on network depth.
Findings
Residual networks exhibit subexponential and polynomial dynamics depending on nonlinearity.
Theoretical predictions accurately match empirical training performance on MNIST.
Common initializations like Xavier or He are suboptimal for residual networks, with optimal variances depending on depth.
Abstract
We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the "edge of chaos" hypothesis, these subexponential and polynomial laws allow residual networks to "hover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Machine Learning in Materials Science
Methods*Communicated@Fast*How Do I Communicate to Expedia?
