The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

TL;DR
This paper investigates how implicit bias and data properties influence the generalization ability of two-layer linear neural networks trained with gradient flow, shedding light on the phenomenon of benign overfitting.
Contribution
It provides theoretical bounds on excess risk for such networks, highlighting the roles of initialization quality and data covariance in benign overfitting.
Findings
Bounds on excess risk depend on data covariance and initialization.
Implicit bias analysis reveals factors affecting generalization.
Data properties like sub-Gaussianity are crucial for theoretical guarantees.
Abstract
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
