The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer   Linear Networks

Niladri S. Chatterji; Philip M. Long; Peter L. Bartlett

arXiv:2108.11489·stat.ML·September 13, 2022

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

PDF

Open Access

TL;DR

This paper investigates how implicit bias and data properties influence the generalization ability of two-layer linear neural networks trained with gradient flow, shedding light on the phenomenon of benign overfitting.

Contribution

It provides theoretical bounds on excess risk for such networks, highlighting the roles of initialization quality and data covariance in benign overfitting.

Findings

01

Bounds on excess risk depend on data covariance and initialization.

02

Implicit bias analysis reveals factors affecting generalization.

03

Data properties like sub-Gaussianity are crucial for theoretical guarantees.

Abstract

The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $benign overfitting$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning