A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen, Adrian Riekert

TL;DR
This paper proves that stochastic gradient descent converges to zero risk when training a simple neural network with ReLU activation for constant target functions, under certain conditions.
Contribution
It provides a rigorous convergence proof for SGD in training shallow ReLU neural networks with constant targets, a case previously lacking formal guarantees.
Findings
SGD risk converges to zero for constant target functions
Convergence holds with small learning rates and i.i.d. data
Applicable to networks with one hidden layer and ReLU activation
Abstract
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with neurons on the input layer, neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent
