Infinite-width limit of deep linear neural networks

L\'ena\"ic Chizat; Maria Colombo; Xavier Fern\'andez-Real; Alessio; Figalli

arXiv:2211.16980·cs.LG·December 1, 2022·5 cites

Infinite-width limit of deep linear neural networks

L\'ena\"ic Chizat, Maria Colombo, Xavier Fern\'andez-Real, Alessio, Figalli

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the behavior of deep linear neural networks as their width approaches infinity, revealing convergence properties of training dynamics and the predictors to deterministic limits with exponential convergence rates.

Contribution

It provides a rigorous analysis of the infinite-width limit of deep linear neural networks, including convergence of training dynamics and the law of weights during training.

Findings

01

Training dynamics converge to a deterministic limit as width increases.

02

The linear predictor converges exponentially to the minimal $\

03

The law of weights along training can be precisely characterized.

Abstract

This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of neurons. We finally study the continuous-time limit obtained for infinitely wide linear neural networks and show that the linear predictors of the neural network converge at an exponential rate to the minimal $ℓ_{2}$ -norm minimizer of the risk.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lchizat/2022-wide-linear-nn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Mathematical Approximation and Integration · Markov Chains and Monte Carlo Methods