The Inductive Bias of Flatness Regularization for Deep Matrix   Factorization

Khashayar Gatmiry; Zhiyuan Li; Ching-Yao Chuang; Sashank Reddi; Tengyu; Ma; Stefanie Jegelka

arXiv:2306.13239·cs.LG·June 26, 2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu, Ma, Stefanie Jegelka

PDF

Open Access

TL;DR

This paper investigates why flatness regularization improves generalization in deep linear networks, showing it promotes low Schatten 1-norm solutions under RIP conditions, with empirical validation on synthetic data.

Contribution

It provides the first theoretical analysis linking flatness regularization to Schatten norm minimization in deep matrix factorization, explaining its generalization benefits.

Findings

01

Minimizing Hessian trace approximates Schatten 1-norm minimization.

02

Flatness regularization leads to better generalization in deep linear networks.

03

Empirical results support the theoretical connection on synthetic datasets.

Abstract

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Matrix Theory and Algorithms · Neural Networks and Applications