On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Lenaic Chizat (SIERRA), Francis Bach (SIERRA)

TL;DR
This paper demonstrates that in over-parameterized models, a continuous-time gradient flow on discretized measures converges globally to minimizers, leveraging optimal transport theory, with practical implications shown through numerical experiments.
Contribution
It establishes the global convergence of gradient descent in over-parameterized models using Wasserstein gradient flows and optimal transport, a novel theoretical insight.
Findings
Gradient flow converges to global minimizers in the many-particle limit.
Numerical experiments confirm asymptotic behavior with a reasonable number of particles.
Convergence occurs even in high-dimensional settings.
Abstract
Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Advanced Neuroimaging Techniques and Applications
