On the existence of global minima and convergence analyses for gradient   descent methods in the training of deep neural networks

Arnulf Jentzen; Adrian Riekert

arXiv:2112.09684·math.OC·July 14, 2022

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

Arnulf Jentzen, Adrian Riekert

PDF

TL;DR

This paper proves convergence of gradient descent in training deep ReLU neural networks under certain data and target function assumptions, and analyzes solutions of gradient flow equations for such networks.

Contribution

It establishes convergence to global minima for deep ReLU ANNs with specific data assumptions and analyzes gradient flow trajectories, extending previous theoretical understanding.

Findings

01

Convergence of risk for deep ReLU ANNs with random initialization.

02

Existence of global minima for shallow networks with Lipschitz targets.

03

Polynomial rate convergence of gradient flow trajectories.

Abstract

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.