Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion, L\'ena\"ic Chizat

TL;DR
This paper demonstrates that deep linear networks for univariate regression are implicitly regularized towards flat minima, with sharpness bounds depending on data covariance and depth, influencing optimization and generalization.
Contribution
It establishes a lower bound on sharpness growing with depth and shows gradient flow favors flat minima, with proofs for different initializations and numerical validation.
Findings
Minimizers have sharpness bounded below by a linear function of depth.
Gradient flow leads to flat minima with sharpness proportional to the lower bound.
Results hold for both small-scale and residual initializations.
Abstract
The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate. We show an implicit regularization towards flat minima: the sharpness of the minimizer is no more than a constant times the lower bound. The constant depends on the condition number of the data covariance matrix, but not on width or depth. This result is proven both for a small-scale initialization and a residual initialization. Results of independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
