Commutative Width and Depth Scaling in Deep Neural Networks
Soufiane Hayou

TL;DR
This paper investigates the conditions under which the limits of neural network functions are independent of the order of increasing width and depth, focusing on covariance kernels and skip connections, with implications for neural network design.
Contribution
It introduces a formal framework for commutativity in neural network scaling, extending previous results to networks with skip connections and scaled branches.
Findings
Infinite width and depth limits yield the same covariance with skip connections when scaled properly.
The results generalize previous work by including networks with skip connections.
The proof techniques are accessible and do not require stochastic calculus.
Abstract
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
