Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks
Margalit Glasgow, Joan Bruna

TL;DR
This paper proves uniform-in-time bounds for the difference between finite-width neural network outputs and their mean-field limit during training, under certain regularity conditions, without relying on landscape convexity or noise.
Contribution
It establishes non-asymptotic, uniform-in-time weak propagation-of-chaos results for shallow neural networks in the feature-learning regime, extending previous bounds to long-time behavior.
Findings
Uniform-in-time bounds depend on the convergence rate of mean-field dynamics.
Achieves $ ext{poly}(d) m^{- ext{min}(1,c/6)}$ error under certain conditions.
Faster than $t^{-2}$ convergence rate yields polynomial complexity in $d/ ext{error}$.
Abstract
We consider one-hidden layer neural networks trained in the feature-learning regime using gradient descent, and relate the output of the finite-width network to its infinite-width counterpart , which evolves in the mean-field dynamics. While constant-time horizon bounds for may be obtained via standard Gr\"onwall estimates, the long-time behavior of the fluctuation is a more delicate matter. Uniform-in-time bounds often rely on (local) strong convexity in the landscape or Logarithmic Sobolev inequalities present in noisy gradient dynamics. In this work, we establish non-asymptotic weak propagation-of-chaos that holds uniformly in time, obtained by exploiting instead the convergence rate of the mean-field deterministic Wasserstein-gradient-flow dynamics. Specifically, denoting by the mean-field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
