Trainability and Accuracy of Neural Networks: An Interacting Particle   System Approach

Grant M. Rotskoff; Eric Vanden-Eijnden

arXiv:1805.00915·stat.ML·February 8, 2023·96 cites

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

Grant M. Rotskoff, Eric Vanden-Eijnden

PDF

Open Access

TL;DR

This paper models neural network training as an interacting particle system, establishing conditions for convergence and error scaling, and providing insights into the dynamics of stochastic gradient descent in high-dimensional settings.

Contribution

It introduces a novel particle system framework for analyzing neural network training, deriving universal error bounds and convergence properties for large networks.

Findings

01

Error scales as O(n^{-1}) for large networks

02

SGD convergence rate is independent of network size

03

Guidelines for step size and batch size in training

Abstract

Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number $n$ of units is large, the empirical distribution of the particles descends on a convex landscape towards the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Mathematical Approximation and Integration · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent