Exponential Convergence Time of Gradient Descent for One-Dimensional   Deep Linear Neural Networks

Ohad Shamir

arXiv:1809.08587·cs.LG·June 14, 2019·22 cites

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Ohad Shamir

PDF

Open Access

TL;DR

This paper proves that gradient descent on deep linear neural networks in one dimension takes exponentially longer to converge as the network depth increases, highlighting a significant challenge in understanding deep network training.

Contribution

It establishes that convergence time for gradient descent scales exponentially with network depth in one-dimensional deep linear neural networks under standard initializations.

Findings

01

Convergence time scales exponentially with depth in 1D networks.

02

Empirical evidence suggests similar behavior in higher dimensions.

03

Highlights potential difficulties in training deep linear networks.

Abstract

We study the dynamics of gradient descent on objective functions of the form $f (\prod_{i = 1}^{k} w_{i})$ (with respect to scalar parameters $w_{1}, \dots, w_{k}$ ), which arise in the context of training depth- $k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$ , the number of iterations required for convergence scales exponentially with the depth $k$ . We also show empirically that this phenomenon can occur in higher dimensions, where each $w_{i}$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications