Uniform-in-time concentration in two-layer neural networks via transportation inequalities

Arnaud Guillin (LMBP); Boris Nectoux (LMBP); Paul Stos (LMBP)

arXiv:2603.01842·cs.NE·March 3, 2026

Uniform-in-time concentration in two-layer neural networks via transportation inequalities

Arnaud Guillin (LMBP), Boris Nectoux (LMBP), Paul Stos (LMBP)

PDF

Open Access

TL;DR

This paper establishes uniform-in-time concentration bounds for two-layer neural networks trained with SGD, using transportation inequalities to connect the network's predictions to its mean-field limit with dimension-free rates.

Contribution

It introduces transportation inequalities for SGD parameter laws and derives uniform-in-time concentration bounds in Wasserstein distances for neural network predictions.

Findings

01

Uniform-in-time concentration bounds for neural network predictions

02

Transportation inequalities for SGD parameter distributions

03

Dimension-free convergence rates in sliced-Wasserstein distance

Abstract

We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p $\in$ {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function $Φ$ . We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Model Reduction and Neural Networks