Towards Better Generalization: Weight Decay Induces Low-rank Bias for   Neural Networks

Ke Chen; Chugang Yi; Haizhao Yang

arXiv:2410.02176·cs.LG·October 4, 2024

Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

Ke Chen, Chugang Yi, Haizhao Yang

PDF

Open Access

TL;DR

This paper investigates how weight decay encourages neural network weights to become low-rank, particularly approximately rank-two, which improves generalization without relying on common assumptions.

Contribution

Theoretical proof that weight decay induces a low-rank bias in neural networks trained with SGD, supported by empirical evidence across tasks.

Findings

01

Weight decay leads to low-rank weight matrices in neural networks.

02

Low-rank bias is necessary for better generalization.

03

Theoretical bounds show improved generalization with low-rank bias.

Abstract

We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classification tasks. Our work differs from previous studies as our theoretical analysis does not rely on common assumptions regarding the training data distribution, optimality of weight matrices, or specific training procedures. Furthermore, by leveraging the low-rank bias, we derive improved generalization error bounds and provide numerical evidence showing that better generalization can be achieved. Thus, our work offers both theoretical and empirical insights into the strong generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent · Weight Decay