Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning
Lukas Baumg\"artner, Franz Bethke

TL;DR
This paper demonstrates how standard algorithmic differentiation tools can correctly compute generalized derivatives for abs-smooth functions, enabling effective gradient-based training of neural networks with ReLU activations.
Contribution
It identifies algebraic conditions under which AD tools compute Clarke gradients correctly for abs-smooth functions without modifications.
Findings
Standard AD tools suffice for generalized gradients in ReLU networks.
Correct gradient computation is guaranteed if AD chooses derivatives at zero consistently.
The approach enables reliable stochastic gradient descent with non-differentiable objectives.
Abstract
Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools even in the absence of differentiability, although the resulting vectors may not be generalized gradients in the sense of Clarke. The paper at hand focuses on objectives in which the non-differentiability arises solely from the evaluation of the absolute value function. In that case, an algebraic condition based on the evaluation procedure of the objective is identified, that guarantees that Clarke gradients are correctly computed without requiring any modifications of the AD tool in question. The analysis allows to prove that any standard AD tool is adequate to drive a stochastic generalized gradient descent method for training a dense neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeophysics and Sensor Technology · Image and Signal Denoising Methods · Structural Health Monitoring Techniques
