Convergence of gradient descent for learning linear neural networks
Gabin Maxime Nguegnang, Holger Rauhut, Ulrich Terstiege

TL;DR
This paper analyzes the convergence behavior of gradient descent in training deep linear neural networks, establishing conditions for convergence to critical points and global minima, with insights into the effects of network depth and initialization.
Contribution
It extends previous analysis to deep linear networks, showing convergence to critical points and global minima depending on network depth and initialization.
Findings
Gradient descent converges to critical points under suitable step size conditions.
For two-layer networks, gradient descent almost always reaches a global minimum.
In deeper networks, convergence is to a global minimum on fixed-rank matrix manifolds.
Abstract
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Matrix Theory and Algorithms · Sparse and Compressive Sensing Techniques
