Overall error analysis for the training of deep neural networks via   stochastic gradient descent with random initialisation

Arnulf Jentzen; Timo Welti

arXiv:2003.01291·math.ST·March 4, 2020·1 cites

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

Arnulf Jentzen, Timo Welti

PDF

Open Access

TL;DR

This paper provides the first rigorous mathematical error analysis of deep neural network training using stochastic gradient descent with random initialization, highlighting convergence issues and the curse of dimensionality.

Contribution

It offers the first full error analysis in the literature for deep learning algorithms trained with stochastic gradient descent and random initialisation in the probabilistically strong sense.

Findings

01

Established the first full error analysis for deep learning with stochastic gradient descent.

02

Demonstrated convergence speed limitations and curse of dimensionality effects.

03

Provided insights into the mathematical foundations of deep neural network training.

Abstract

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Mathematical Approximation and Integration · Statistical Methods and Inference

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings