Mean-field theory of two-layers neural networks: dimension-free bounds   and kernel limit

Song Mei; Theodor Misiakiewicz; Andrea Montanari

arXiv:1902.06015·stat.ML·February 19, 2019·94 cites

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

Song Mei, Theodor Misiakiewicz, Andrea Montanari

PDF

Open Access

TL;DR

This paper provides improved theoretical guarantees for the mean-field approximation of two-layer neural network training dynamics, demonstrating dimension-free bounds and connecting the analysis to kernel ridge regression.

Contribution

It establishes that the mean-field approximation holds under less restrictive conditions, including unbounded activations and noise, and links neural network training to kernel methods.

Findings

01

Dimension-free bounds for mean-field approximation

02

Extension to unbounded activation functions

03

Connection to kernel ridge regression

Abstract

We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^{D}$ (where $D$ is the number of parameters associated to each neuron). This evolution can be defined through a partial differential equation or, equivalently, as the gradient flow in the Wasserstein space of probability distributions. Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$ . In this paper we establish stronger and more general approximation guarantees. First of all, we show that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Markov Chains and Monte Carlo Methods