Uniform-in-time concentration in two-layer neural networks via transportation inequalities
Arnaud Guillin (LMBP), Boris Nectoux (LMBP), Paul Stos (LMBP)

TL;DR
This paper establishes uniform-in-time concentration bounds for two-layer neural networks trained with SGD, using transportation inequalities to connect the network's predictions to its mean-field limit with dimension-free rates.
Contribution
It introduces transportation inequalities for SGD parameter laws and derives uniform-in-time concentration bounds in Wasserstein distances for neural network predictions.
Findings
Uniform-in-time concentration bounds for neural network predictions
Transportation inequalities for SGD parameter distributions
Dimension-free convergence rates in sliced-Wasserstein distance
Abstract
We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function . We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Model Reduction and Neural Networks
