Implicit Regularization of Large Neural Networks via Mean-Field Formulation

Beatrice Acciaio; Jakob Heiss; Gudmund Pammer; Qinxin Yan

arXiv:2603.20892·math.OC·March 24, 2026

Implicit Regularization of Large Neural Networks via Mean-Field Formulation

Beatrice Acciaio, Jakob Heiss, Gudmund Pammer, Qinxin Yan

PDF

Open Access

TL;DR

This paper introduces a mathematical framework using mean-field theory and stochastic control to explain how early stopping acts as an implicit regularizer in training overparametrized neural networks, linking dynamics to a new probability measure metric.

Contribution

It develops a mean-field and control-based formulation of neural network training dynamics, revealing how early stopping induces implicit regularization through a novel metric on probability measures.

Findings

01

The dynamics follow a gradient flow on probability measures.

02

A new metric generalizing Wasserstein-2 distance is introduced.

03

Non-asymptotic bounds relate regularization to stopping time.

Abstract

We propose a mathematical framework to explain implicit regularization from early stopping during the training of overparametrized neural networks. In the mean-field limit, the parameter distribution evolves according to a gradient flow on the space of probability measures. We show that these dynamics admit an equivalent McKean-Vlasov stochastic control formulation through the corresponding Hamilton-Jacobi-Bellman (HJB) equation. The control viewpoint yields a Dynamic Programming Principle (DPP), which we use to define a new metric on probability measures. This metric can be viewed as a mean-field generalization of the control representation of the Wasserstein-2 distance, and it naturally appears as a regularization term selected by early stopping. We further obtain non-asymptotic bounds describing how the induced regularization depends on the stopping time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning