Width and Depth Limits Commute in Residual Networks

Soufiane Hayou; Greg Yang

arXiv:2302.00453·stat.ML·August 11, 2023·1 cites

Width and Depth Limits Commute in Residual Networks

Soufiane Hayou, Greg Yang

PDF

Open Access

TL;DR

This paper demonstrates that in residual networks with scaled branches, taking width and depth to infinity yields the same covariance structure, explaining the effectiveness of infinite-width-then-depth analysis and revealing Gaussian pre-activations beneficial for Bayesian deep learning.

Contribution

It shows that width and depth limits commute in residual networks with specific scaling, providing theoretical justification for existing analysis methods and insights into Gaussian pre-activations.

Findings

01

Covariance structure is invariant to the order of taking width and depth to infinity.

02

Pre-activations follow Gaussian distributions under the studied scaling.

03

Simulation results match theoretical predictions closely.

Abstract

We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/ d e pt h$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference