Generalization and Stability of Interpolating Neural Networks with   Minimal Width

Hossein Taheri; Christos Thrampoulidis

arXiv:2302.09235·stat.ML·March 29, 2023·1 cites

Generalization and Stability of Interpolating Neural Networks with Minimal Width

Hossein Taheri, Christos Thrampoulidis

PDF

Open Access

TL;DR

This paper analyzes how shallow neural networks trained with gradient descent generalize and optimize in the interpolating regime, revealing conditions under which they achieve near-optimal test error with minimal width.

Contribution

It introduces a novel analysis framework for shallow neural networks showing convergence and generalization bounds comparable to convex models, even with minimal width.

Findings

01

Gradient descent achieves low training and generalization error with minimal hidden neurons.

02

Test loss bounds of rac{1}{n} are achieved with rac{ ext{log}^4(n)} neurons and rac{n}{ ext{log}^4(n)} iterations.

03

Results outperform existing stability-based bounds requiring larger network widths.

Abstract

We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $ϵ$ and their distance from initialization is $g (ϵ)$ , we demonstrate that gradient descent with $n$ training data achieves training error $O (g (1/ T)^{2} / T)$ and generalization error $O (g (1/ T)^{2} / n)$ at iteration $T$ , provided there are at least $m = Ω (g (1/ T)^{4})$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde{O} (1/ T)$ given polylogarithmic number of neurons $m = Ω (lo g^{4} (T))$ . Moreover, with $m = Ω (lo g^{4} (n))$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications

MethodsTest