Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach
Grant M. Rotskoff, Eric Vanden-Eijnden

TL;DR
This paper models neural network training as an interacting particle system, establishing conditions for convergence and error scaling, and providing insights into the dynamics of stochastic gradient descent in high-dimensional settings.
Contribution
It introduces a novel particle system framework for analyzing neural network training, deriving universal error bounds and convergence properties for large networks.
Findings
Error scales as O(n^{-1}) for large networks
SGD convergence rate is independent of network size
Guidelines for step size and batch size in training
Abstract
Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number of units is large, the empirical distribution of the particles descends on a convex landscape towards the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Mathematical Approximation and Integration · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
