Width and Depth Limits Commute in Residual Networks
Soufiane Hayou, Greg Yang

TL;DR
This paper demonstrates that in residual networks with scaled branches, taking width and depth to infinity yields the same covariance structure, explaining the effectiveness of infinite-width-then-depth analysis and revealing Gaussian pre-activations beneficial for Bayesian deep learning.
Contribution
It shows that width and depth limits commute in residual networks with specific scaling, providing theoretical justification for existing analysis methods and insights into Gaussian pre-activations.
Findings
Covariance structure is invariant to the order of taking width and depth to infinity.
Pre-activations follow Gaussian distributions under the studied scaling.
Simulation results match theoretical predictions closely.
Abstract
We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference
