Mathematical analysis of the gradients in deep learning
Steffen Dereich, Thang Do, Arnulf Jentzen, Frederic Weber

TL;DR
This paper provides a rigorous mathematical analysis of generalized gradients used in training deep neural networks with ReLU activations, clarifying their properties and relation to standard gradients.
Contribution
It introduces an approximation procedure for generalized gradients, proves they are limiting Fréchet subgradients, and shows they match standard gradients where the cost function is smooth.
Findings
Generalized gradients can be accurately approximated and are limiting Fréchet subgradients.
On smooth regions, generalized gradients coincide with standard gradients.
The analysis clarifies the mathematical foundation of gradient-based training in deep learning.
Abstract
Deep learning algorithms -- typically consisting of a class of deep artificial neural networks (ANNs) trained by a stochastic gradient descent (SGD) optimization method -- are nowadays an integral part in many areas of science, industry, and also our day to day life. Roughly speaking, in their most basic form, ANNs can be regarded as functions that consist of a series of compositions of affine-linear functions with multidimensional versions of so-called activation functions. One of the most popular of such activation functions is the rectified linear unit (ReLU) function . The ReLU function is, however, not differentiable and, typically, this lack of regularity transfers to the cost function of the supervised learning problem under consideration. Regardless of this lack of differentiability issue, deep learning practioners apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent
