Gradient descent with identity initialization efficiently learns   positive definite linear transformations by deep residual networks

Peter L. Bartlett; David P. Helmbold; Philip M. Long

arXiv:1802.06093·cs.LG·June 19, 2018

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

Peter L. Bartlett, David P. Helmbold, Philip M. Long

PDF

TL;DR

This paper studies how deep residual networks with identity initialization learn positive definite linear transformations, providing bounds on convergence and highlighting cases where gradient descent succeeds or fails.

Contribution

It offers theoretical analysis of gradient descent for deep linear networks with identity initialization, especially for positive definite matrices, and identifies conditions for successful learning.

Findings

01

Polynomial convergence bounds for positive definite matrices

02

Failure of gradient descent for matrices with negative eigenvalues

03

Regularization towards identity does not always improve convergence

Abstract

We analyze algorithms for approximating a function $f (x) = Φ x$ mapping $ℜ^{d}$ to $ℜ^{d}$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $Θ_{1}, ..., Θ_{L}$ and defined by $h (x) = Θ_{L} Θ_{L - 1} ... Θ_{1} x$ . We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $Φ$ , in the case where the initial hypothesis $Θ_{1} = ... = Θ_{L} = I$ has excess loss bounded by a small enough constant. On the other hand, we show that gradient descent fails to converge for $Φ$ whose distance from the identity is a larger constant, and we show that some forms of regularization toward the identity in each layer do…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.