On the complexity of nonsmooth automatic differentiation
J\'er\^ome Bolte (TSE), Ryan Boustany (TSE), Edouard Pauwels (IRIT),, B\'eatrice Pesquet-Popescu

TL;DR
This paper analyzes the computational complexity of nonsmooth automatic differentiation using conservative gradients, showing that backward mode can be computationally cheap and independent of dimension for certain functions, with implications for neural network training.
Contribution
It extends the cheap gradient principle to nonsmooth functions and compares the efficiency of backward and forward modes of differentiation in this context.
Findings
Backward mode complexity is independent of dimension for semi-algebraic functions.
Backward propagation of conservative gradients is generally more efficient than forward approaches.
Finding subgradients in the Clarke subdifferential is NP-hard.
Abstract
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This considerably extends Baur-Strassen's smooth cheap gradient principle. We illustrate our results by establishing fast backpropagation results of conservative gradients through feedforward neural networks with standard activation and loss functions. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches, which have, to this day, dimensional-dependent worst-case overhead estimates. We provide further results suggesting the superiority of backward propagation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Machine Learning and Algorithms
