On the Global Convergence of Gradient Descent for Over-parameterized   Models using Optimal Transport

Lenaic Chizat (SIERRA); Francis Bach (SIERRA)

arXiv:1805.09545·math.OC·October 30, 2018·95 cites

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Lenaic Chizat (SIERRA), Francis Bach (SIERRA)

PDF

Open Access

TL;DR

This paper demonstrates that in over-parameterized models, a continuous-time gradient flow on discretized measures converges globally to minimizers, leveraging optimal transport theory, with practical implications shown through numerical experiments.

Contribution

It establishes the global convergence of gradient descent in over-parameterized models using Wasserstein gradient flows and optimal transport, a novel theoretical insight.

Findings

01

Gradient flow converges to global minimizers in the many-particle limit.

02

Numerical experiments confirm asymptotic behavior with a reasonable number of particles.

03

Convergence occurs even in high-dimensional settings.

Abstract

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Advanced Neuroimaging Techniques and Applications