Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Ohad Shamir

TL;DR
This paper proves that gradient descent on deep linear neural networks in one dimension takes exponentially longer to converge as the network depth increases, highlighting a significant challenge in understanding deep network training.
Contribution
It establishes that convergence time for gradient descent scales exponentially with network depth in one-dimensional deep linear neural networks under standard initializations.
Findings
Convergence time scales exponentially with depth in 1D networks.
Empirical evidence suggests similar behavior in higher dimensions.
Highlights potential difficulties in training deep linear networks.
Abstract
We study the dynamics of gradient descent on objective functions of the form (with respect to scalar parameters ), which arise in the context of training depth- linear neural networks. We prove that for standard random initializations, and under mild assumptions on , the number of iterations required for convergence scales exponentially with the depth . We also show empirically that this phenomenon can occur in higher dimensions, where each is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where is large.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
