Sharper Guarantees for Learning Neural Network Classifiers with Gradient   Methods

Hossein Taheri; Christos Thrampoulidis; Arya Mazumdar

arXiv:2410.10024·cs.LG·December 9, 2024

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar

PDF

Open Access

TL;DR

This paper provides tighter, data-dependent convergence and generalization guarantees for neural network classifiers trained with gradient methods, improving bounds and analyzing the role of initialization and step-size.

Contribution

It introduces novel excess risk bounds for deep networks, improves test error bounds under NTK assumptions, and demonstrates the impact of large step-sizes on classification performance.

Findings

01

Tighter bounds on excess risk for neural networks with smooth activation.

02

Improved test error bounds for NTK-feature separable data.

03

Large step-size SGD achieves rapid perfect classification on XOR distribution.

Abstract

In this paper, we study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our first result is a novel bound on the excess risk of deep networks trained by the logistic loss, via an alogirthmic stability analysis. Compared to previous works, our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds. Importantly, the bounds we derive in this paper are tighter, hold even for neural networks of small width, do not scale unfavorably with width, are algorithm-dependent, and consequently capture the role of initialization on the sample complexity of gradient descent for deep nets. Specialized to noiseless data separable with margin $γ$ by neural tangent kernel (NTK) features of a network of width $Ω (poly (lo g (n)))$ , we show the test-error rate to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Data Processing Techniques

MethodsStochastic Gradient Descent · Neural Tangent Kernel · Sparse Evolutionary Training