Doubly infinite residual neural networks: a diffusion process approach

Stefano Peluchetti; Stefano Favaro

arXiv:2007.03253·stat.ML·September 21, 2021

Doubly infinite residual neural networks: a diffusion process approach

Stefano Peluchetti, Stefano Favaro

PDF

Open Access

TL;DR

This paper explores the behavior of doubly infinite residual neural networks, where both depth and width grow unboundedly, revealing their convergence properties and limitations in expressiveness through a diffusion process framework.

Contribution

It extends diffusion process analysis from infinitely deep to doubly infinite ResNets, providing new insights into their training dynamics and expressive power.

Findings

01

Doubly infinite ResNets' dynamics converge to deterministic limits at initialization.

02

Analytical expressions for inference are derived for both weakly and fully trained ResNets.

03

Limited expressive power is identified for shallow residual blocks with i.i.d. parameters.

Abstract

Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsDiffusion