Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with   Linear Convergence Rates

Jingwei Zhang; Xunpeng Huang; Jincheng Yu

arXiv:2205.09860·cs.LG·October 19, 2022

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

Jingwei Zhang, Xunpeng Huang, Jincheng Yu

PDF

Open Access

TL;DR

This paper proves the first linear convergence rate for training two-layer neural networks in the mean-field regime using continuous-time noisy gradient descent, advancing understanding of global optimality beyond the neural tangent kernel approximation.

Contribution

It establishes a quantitative linear convergence rate for two-layer neural networks in the mean-field regime, a significant step beyond prior asymptotic results.

Findings

01

First linear convergence result for mean-field two-layer networks

02

Uses a novel time-dependent estimate of logarithmic Sobolev constants

03

Demonstrates global optimality with linear convergence rates

Abstract

We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (\cite{chizat2018global, mei2018mean}) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated unbounded nonlinearity of the training dynamics. This work establishes the first linear convergence result for vanilla two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Applications

MethodsNeural Tangent Kernel