Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Arnulf Jentzen, Timo Welti

TL;DR
This paper provides the first rigorous mathematical error analysis of deep neural network training using stochastic gradient descent with random initialization, highlighting convergence issues and the curse of dimensionality.
Contribution
It offers the first full error analysis in the literature for deep learning algorithms trained with stochastic gradient descent and random initialisation in the probabilistically strong sense.
Findings
Established the first full error analysis for deep learning with stochastic gradient descent.
Demonstrated convergence speed limitations and curse of dimensionality effects.
Provided insights into the mathematical foundations of deep neural network training.
Abstract
In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Mathematical Approximation and Integration · Statistical Methods and Inference
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
