Deep linear networks for regression are implicitly regularized towards   flat minima

Pierre Marion; L\'ena\"ic Chizat

arXiv:2405.13456·stat.ML·October 29, 2024

Deep linear networks for regression are implicitly regularized towards flat minima

Pierre Marion, L\'ena\"ic Chizat

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that deep linear networks for univariate regression are implicitly regularized towards flat minima, with sharpness bounds depending on data covariance and depth, influencing optimization and generalization.

Contribution

It establishes a lower bound on sharpness growing with depth and shows gradient flow favors flat minima, with proofs for different initializations and numerical validation.

Findings

01

Minimizers have sharpness bounded below by a linear function of depth.

02

Gradient flow leads to flat minima with sharpness proportional to the lower bound.

03

Results hold for both small-scale and residual initializations.

Abstract

The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate. We show an implicit regularization towards flat minima: the sharpness of the minimizer is no more than a constant times the lower bound. The constant depends on the condition number of the data covariance matrix, but not on width or depth. This result is proven both for a small-scale initialization and a residual initialization. Results of independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pierremarion23/implicit-reg-sharpness
jaxOfficial

Videos

Deep linear networks for regression are implicitly regularized towards flat minima· slideslive

Taxonomy

TopicsNeural Networks and Applications