Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

Kaitong Hu; Zhenjie Ren; David Siska; Lukasz Szpruch

arXiv:1905.07769·math.PR·December 15, 2020·30 cites

Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

Kaitong Hu, Zhenjie Ren, David Siska, Lukasz Szpruch

PDF

Open Access

TL;DR

This paper investigates the theoretical convergence of Mean-Field Langevin Dynamics in neural network training, demonstrating exponential convergence to a unique energy minimizer in the space of probability measures without restrictive assumptions.

Contribution

It introduces a novel convergence proof for MFLD using a generalized LaSalle's invariance principle and HWI inequality, applicable to general convex objectives and non-symmetric interactions.

Findings

01

Proves exponential convergence of MFLD to a stationary distribution.

02

Shows the error between finite and infinite-dimensional problems decreases as 1 over the number of parameters.

03

Establishes convergence without assuming convolution or symmetry in interaction potentials.

Abstract

Our work is motivated by a desire to study the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of neural networks. The key insight, already observed in the works of Mei, Montanari and Nguyen (2018), Chizat and Bach (2018) as well as Rotskoff and Vanden-Eijnden (2018), is that a certain class of the finite-dimensional non-convex problems becomes convex when lifted to infinite-dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first-order condition using the notion of linear functional derivative. Next, we study the corresponding gradient flow structure in 2-Wasserstein metric, which we call Mean-Field Langevin Dynamics (MFLD), and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Advanced Neuroimaging Techniques and Applications · Stochastic Gradient Optimization Techniques

MethodsConvolution