Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates
Jingwei Zhang, Xunpeng Huang, Jincheng Yu

TL;DR
This paper proves the first linear convergence rate for training two-layer neural networks in the mean-field regime using continuous-time noisy gradient descent, advancing understanding of global optimality beyond the neural tangent kernel approximation.
Contribution
It establishes a quantitative linear convergence rate for two-layer neural networks in the mean-field regime, a significant step beyond prior asymptotic results.
Findings
First linear convergence result for mean-field two-layer networks
Uses a novel time-dependent estimate of logarithmic Sobolev constants
Demonstrates global optimality with linear convergence rates
Abstract
We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (\cite{chizat2018global, mei2018mean}) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated unbounded nonlinearity of the training dynamics. This work establishes the first linear convergence result for vanilla two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsNeural Tangent Kernel
