A proof of convergence for stochastic gradient descent in the training   of artificial neural networks with ReLU activation for constant target   functions

Arnulf Jentzen; Adrian Riekert

arXiv:2104.00277·math.NA·September 28, 2022

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Arnulf Jentzen, Adrian Riekert

PDF

TL;DR

This paper proves that stochastic gradient descent converges to zero risk when training a simple neural network with ReLU activation for constant target functions, under certain conditions.

Contribution

It provides a rigorous convergence proof for SGD in training shallow ReLU neural networks with constant targets, a case previously lacking formal guarantees.

Findings

01

SGD risk converges to zero for constant target functions

02

Convergence holds with small learning rates and i.i.d. data

03

Applicable to networks with one hidden layer and ReLU activation

Abstract

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with $d \in N$ neurons on the input layer, $H \in N$ neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent