Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Spencer Frei; Niladri S. Chatterji; Peter L. Bartlett

arXiv:2202.05928·cs.LG·July 4, 2025·5 cites

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

PDF

Open Access

TL;DR

This paper demonstrates that two-layer neural networks trained with gradient descent can perfectly fit noisy data yet still generalize well, even in nonlinear settings with adversarial label noise, revealing benign overfitting beyond linear models.

Contribution

The paper provides the first analysis of benign overfitting in nonlinear neural networks trained by gradient descent on noisy data, extending understanding beyond linear and kernel methods.

Findings

01

Neural networks can interpolate noisy data and still achieve optimal test error.

02

Benign overfitting occurs in nonlinear neural networks trained with gradient descent.

03

The analysis applies to data from well-separated log-concave distributions with adversarial label noise.

Abstract

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis