Gradient descent aligns the layers of deep linear networks
Ziwei Ji, Matus Telgarsky

TL;DR
This paper proves that gradient descent on deep linear networks leads to risk convergence and layer alignment, with the network's linear function approaching the maximum margin solution on linearly separable data.
Contribution
It establishes risk convergence and implicit regularization effects, including layer alignment and maximum margin convergence, for gradient flow and gradient descent on deep linear networks.
Findings
Risk converges to zero for gradient flow on decreasing loss functions.
Normalized weight matrices become rank-1 and aligned across layers.
The network's linear function converges to the maximum margin solution.
Abstract
This paper establishes risk convergence and asymptotic weight matrix alignment --- a form of implicit regularization --- of gradient flow and gradient descent when applied to deep linear networks on linearly separable data. In more detail, for gradient flow applied to strictly decreasing loss functions (with similar results for gradient descent with particular decreasing step sizes): (i) the risk converges to 0; (ii) the normalized i-th weight matrix asymptotically equals its rank-1 approximation ; (iii) these rank-1 matrices are aligned across layers, meaning . In the case of the logistic loss (binary cross entropy), more can be said: the linear function induced by the network --- the product of its weight matrices --- converges to the same direction as the maximum margin solution. This last property was identified in prior work, but only under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Neuroimaging Techniques and Applications
