Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks
Ke Chen, Chugang Yi, Haizhao Yang

TL;DR
This paper investigates how weight decay encourages neural network weights to become low-rank, particularly approximately rank-two, which improves generalization without relying on common assumptions.
Contribution
Theoretical proof that weight decay induces a low-rank bias in neural networks trained with SGD, supported by empirical evidence across tasks.
Findings
Weight decay leads to low-rank weight matrices in neural networks.
Low-rank bias is necessary for better generalization.
Theoretical bounds show improved generalization with low-rank bias.
Abstract
We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classification tasks. Our work differs from previous studies as our theoretical analysis does not rely on common assumptions regarding the training data distribution, optimality of weight matrices, or specific training procedures. Furthermore, by leveraging the low-rank bias, we derive improved generalization error bounds and provide numerical evidence showing that better generalization can be achieved. Thus, our work offers both theoretical and empirical insights into the strong generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent · Weight Decay
