Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao, Quanquan Gu

TL;DR
This paper provides new theoretical generalization bounds for wide, over-parameterized neural networks trained with SGD, linking the bounds to neural tangent models and kernel methods, and showing they are independent of network width.
Contribution
It introduces a novel generalization bound based on neural tangent random features, applicable to wide neural networks, and connects these bounds to neural tangent kernels.
Findings
Generalization error bound of order (n^{-1/2}) independent of network width
Bound applies to networks trained with SGD and random initialization
Establishes a connection between generalization bounds and neural tangent kernel
Abstract
We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected - loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
