Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning
Wenjun Xiong, Juan Ding, Xinlei Zuo, Qizhai Li

TL;DR
This paper introduces T2pm-SGD, a new analysis framework that provides tighter generalization error bounds for non-convex learning with SGD, validated by experiments on benchmark datasets.
Contribution
The paper develops T2pm-SGD, improving generalization bounds for non-convex SGD by reducing trajectory and flatness terms, applicable to sub-Gaussian and bounded loss functions.
Findings
Improved trajectory error bound to $O(n^{-1})$ for bounded losses.
Refined overall generalization bound to $O(n^{-2/3})$ with optimal noise variance.
Experimental validation on MNIST and CIFAR-10 datasets confirms theoretical improvements.
Abstract
Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD's generalization properties is crucial for ensuring robust model performance on unseen data. In this paper, we analyze the generalization error bounds of SGD for non-convex learning by introducing the Type II perturbed SGD (T2pm-SGD), which accommodates both sub-Gaussian and bounded loss functions. The generalization error bound is decomposed into two components: the trajectory term and the flatness term. Our analysis improves the trajectory term to , significantly enhancing the previous bound for bounded losses, where n is the number of training samples and b is the batch size. By selecting an optimal variance for the perturbation noise, the overall bound is further refined to . For sub-Gaussian loss functions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
