Doubly infinite residual neural networks: a diffusion process approach
Stefano Peluchetti, Stefano Favaro

TL;DR
This paper explores the behavior of doubly infinite residual neural networks, where both depth and width grow unboundedly, revealing their convergence properties and limitations in expressiveness through a diffusion process framework.
Contribution
It extends diffusion process analysis from infinitely deep to doubly infinite ResNets, providing new insights into their training dynamics and expressive power.
Findings
Doubly infinite ResNets' dynamics converge to deterministic limits at initialization.
Analytical expressions for inference are derived for both weakly and fully trained ResNets.
Limited expressive power is identified for shallow residual blocks with i.i.d. parameters.
Abstract
Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsDiffusion
