Global Convergence of Second-order Dynamics in Two-layer Neural Networks
Walid Krichene, Kenneth F. Caluya, Abhishek Halder

TL;DR
This paper proves that second-order dynamics with momentum, specifically the heavy ball method, also converge globally in two-layer neural networks in the mean field limit, extending previous first-order results.
Contribution
It demonstrates global convergence of second-order (momentum-based) dynamics in two-layer neural networks, using Lyapunov functionals and mean field analysis, a novel extension of prior first-order results.
Findings
Second-order dynamics converge globally in the mean field limit.
The resulting integro-PDE is a nonlinear kinetic Fokker-Planck equation.
Numerical simulations suggest convergence for small networks.
Abstract
Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit, by making a connection between the mean field dynamics and the Wasserstein gradient flow. These results were derived for first-order gradient flow, and a natural question is whether second-order dynamics, i.e., dynamics with momentum, exhibit a similar guarantee. We show that the answer is positive for the heavy ball method. In this case, the resulting integro-PDE is a nonlinear kinetic Fokker Planck equation, and unlike the first-order case, it has no apparent connection with the Wasserstein gradient flow. Instead, we study the variations of a Lyapunov functional along the solution trajectories to characterize the stationary points and to prove convergence. While our results are asymptotic in the mean field limit, numerical simulations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Neuroimaging Techniques and Applications · Neural Networks and Applications
