Limitations of neural network training due to numerical instability of backpropagation
Clemens Karner, Vladimir Kazeev, Philipp Christian Petersen

TL;DR
This paper investigates how numerical instability in floating-point arithmetic during backpropagation limits the ability of gradient descent to train deep ReLU neural networks with exponentially many affine pieces, challenging some theoretical approximation claims.
Contribution
It demonstrates that practical training sequences differ significantly from theoretical constructions due to numerical limitations, highlighting a fundamental constraint in neural network training.
Findings
Numerical instability prevents training of networks with exponentially many affine pieces.
Practical training sequences differ from theoretical high-order approximation sequences.
Numerical study confirms the theoretical limitations.
Abstract
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments that yield high-order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Statistical and numerical algorithms
