A mean-field limit for certain deep neural networks

Dyego Ara\'ujo; Roberto I. Oliveira; Daniel Yukimura

arXiv:1906.00193·math.ST·June 4, 2019·39 cites

A mean-field limit for certain deep neural networks

Dyego Ara\'ujo, Roberto I. Oliveira, Daniel Yukimura

PDF

Open Access

TL;DR

This paper establishes a mean-field limit for deep neural networks with multiple layers trained by stochastic gradient descent, providing a rigorous mathematical framework to describe their training dynamics as the number of neurons per layer grows large.

Contribution

It extends previous mean-field analyses from shallow to deep networks with multiple layers, rigorously deriving the limiting behavior and proving existence and uniqueness of the associated McKean-Vlasov equations.

Findings

01

Network weights are approximated by ideal particles described by a mean-field model.

02

The mean-field limit accurately captures the evolution of deep neural networks during training.

03

Rigorous proof of existence and uniqueness for the McKean-Vlasov problem in this context.

Abstract

Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being trained by stochastic gradient descent. Our networks have a fixed (but arbitrary) number $L \geq 2$ of inner layers; $N ≫ 1$ neurons per layer; full connections between layers; and fixed weights (or "random features" that are not trained) near the input and output. Our results describe the evolution of the DNN during training in the limit when $N \to + \infty$ , which we relate to a mean field model of McKean-Vlasov type. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by the mean-field model. A key part of the proof is to show existence and uniqueness for our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Mathematical Approximation and Integration