Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise
Spencer Frei, Yuan Cao, Quanquan Gu

TL;DR
This paper proves that SGD-trained one-hidden-layer leaky ReLU neural networks of any width can generalize well in the presence of adversarial label noise, matching the best halfspace classifiers for broad distribution classes.
Contribution
It establishes the first theoretical guarantee that overparameterized neural networks trained by SGD can generalize despite adversarial label noise.
Findings
Neural networks achieve accuracy comparable to the best halfspace.
Generalization holds for broad classes of distributions including log-concave and hard margin.
First proof of generalization under adversarial label noise for overparameterized networks.
Abstract
We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
Methods*Communicated@Fast*How Do I Communicate to Expedia? · HuMan(Expedia)||How do I get a human at Expedia? · Stochastic Gradient Descent
