Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

TL;DR
This paper investigates the limitations of stochastic gradient descent in training deep neural networks, showing it can fail to converge under certain architectural and initialization conditions, especially for very deep ReLU networks.
Contribution
The paper provides a rigorous analysis demonstrating non-convergence of stochastic gradient descent for deep ReLU networks when depth exceeds width and initializations are limited.
Findings
SGD fails to converge for very deep ReLU networks under certain conditions.
Convergence requires increasing the number of initializations proportionally to network depth.
Theoretical insights explain why training very deep networks with limited initializations can be problematic.
Abstract
Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia?
