Gradient Descent on Infinitely Wide Neural Networks: Global Convergence   and Generalization

Francis Bach (SIERRA); Lena\"ic Chizat (EPFL)

arXiv:2110.08084·cs.LG·October 18, 2021

Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

Francis Bach (SIERRA), Lena\"ic Chizat (EPFL)

PDF

1 Repo

TL;DR

This paper reviews how gradient descent on infinitely wide two-layer neural networks with homogeneous activations can achieve global convergence guarantees, providing insights into their optimization and generalization properties.

Contribution

It demonstrates that in the limit of infinite width, two-layer neural networks exhibit favorable convergence properties, bridging the gap between theory and practical neural network training.

Findings

01

Global convergence guarantees for infinitely wide neural networks

02

Insights into the optimization landscape of large neural networks

03

Theoretical understanding of generalization in wide neural networks

Abstract

Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lchizat/2021-exp-icm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.