Generalization and Stability of Interpolating Neural Networks with Minimal Width
Hossein Taheri, Christos Thrampoulidis

TL;DR
This paper analyzes how shallow neural networks trained with gradient descent generalize and optimize in the interpolating regime, revealing conditions under which they achieve near-optimal test error with minimal width.
Contribution
It introduces a novel analysis framework for shallow neural networks showing convergence and generalization bounds comparable to convex models, even with minimal width.
Findings
Gradient descent achieves low training and generalization error with minimal hidden neurons.
Test loss bounds of rac{1}{n} are achieved with rac{ ext{log}^4(n)} neurons and rac{n}{ ext{log}^4(n)} iterations.
Results outperform existing stability-based bounds requiring larger network widths.
Abstract
We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error and their distance from initialization is , we demonstrate that gradient descent with training data achieves training error and generalization error at iteration , provided there are at least hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of given polylogarithmic number of neurons . Moreover, with …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
MethodsTest
