Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks
Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli

TL;DR
This paper analyzes how wide two-layer ReLU neural networks trained with SGD tend to produce piecewise linear functions with a limited number of knot points, revealing a bias towards simple solutions and characterizing the weight distribution at convergence.
Contribution
It introduces a mean-field framework for understanding SGD-trained ReLU networks, showing they favor simple piecewise linear solutions with bounded knot points and characterizing the weight distribution as a Gibbs measure.
Findings
SGD leads to piecewise linear solutions with at most three knots between data points.
As the network widens, the weight distribution converges to a Gibbs-like measure.
Empirical evidence suggests knots can occur away from data points, aligning with theoretical predictions.
Abstract
Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
MethodsStochastic Gradient Descent
