A Dynamical Central Limit Theorem for Shallow Neural Networks

Zhengdao Chen; Grant M. Rotskoff; Joan Bruna; Eric Vanden-Eijnden

arXiv:2008.09623·math.PR·March 29, 2022·1 cites

A Dynamical Central Limit Theorem for Shallow Neural Networks

Zhengdao Chen, Grant M. Rotskoff, Joan Bruna, Eric Vanden-Eijnden

PDF

Open Access 1 Video

TL;DR

This paper proves that fluctuations around the mean-field limit of shallow neural networks trained with gradient descent remain bounded during training, with implications for regularization and generalization.

Contribution

It introduces a dynamical CLT for shallow neural networks, showing bounded fluctuations and their dependence on the measure's 2-norm, with convergence results under certain conditions.

Findings

01

Fluctuations remain bounded in mean square during training.

02

The 2-norm of the measure controls the fluctuation variance and generalization.

03

Asymptotic deviation vanishes if the mean-field converges to an interpolating measure.

Abstract

Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. Here, we use a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Dynamical Central Limit Theorem for Shallow Neural Networks· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference