Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar

TL;DR
This paper provides tighter, data-dependent convergence and generalization guarantees for neural network classifiers trained with gradient methods, improving bounds and analyzing the role of initialization and step-size.
Contribution
It introduces novel excess risk bounds for deep networks, improves test error bounds under NTK assumptions, and demonstrates the impact of large step-sizes on classification performance.
Findings
Tighter bounds on excess risk for neural networks with smooth activation.
Improved test error bounds for NTK-feature separable data.
Large step-size SGD achieves rapid perfect classification on XOR distribution.
Abstract
In this paper, we study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our first result is a novel bound on the excess risk of deep networks trained by the logistic loss, via an alogirthmic stability analysis. Compared to previous works, our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds. Importantly, the bounds we derive in this paper are tighter, hold even for neural networks of small width, do not scale unfavorably with width, are algorithm-dependent, and consequently capture the role of initialization on the sample complexity of gradient descent for deep nets. Specialized to noiseless data separable with margin by neural tangent kernel (NTK) features of a network of width , we show the test-error rate to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Data Processing Techniques
MethodsStochastic Gradient Descent · Neural Tangent Kernel · Sparse Evolutionary Training
