Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning

Wenjun Xiong; Juan Ding; Xinlei Zuo; Qizhai Li

arXiv:2506.18645·stat.ML·June 24, 2025

Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning

Wenjun Xiong, Juan Ding, Xinlei Zuo, Qizhai Li

PDF

TL;DR

This paper introduces T2pm-SGD, a new analysis framework that provides tighter generalization error bounds for non-convex learning with SGD, validated by experiments on benchmark datasets.

Contribution

The paper develops T2pm-SGD, improving generalization bounds for non-convex SGD by reducing trajectory and flatness terms, applicable to sub-Gaussian and bounded loss functions.

Findings

01

Improved trajectory error bound to $O(n^{-1})$ for bounded losses.

02

Refined overall generalization bound to $O(n^{-2/3})$ with optimal noise variance.

03

Experimental validation on MNIST and CIFAR-10 datasets confirms theoretical improvements.

Abstract

Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD's generalization properties is crucial for ensuring robust model performance on unseen data. In this paper, we analyze the generalization error bounds of SGD for non-convex learning by introducing the Type II perturbed SGD (T2pm-SGD), which accommodates both sub-Gaussian and bounded loss functions. The generalization error bound is decomposed into two components: the trajectory term and the flatness term. Our analysis improves the trajectory term to $O (n^{- 1})$ , significantly enhancing the previous $O ((nb)^{- 1/2})$ bound for bounded losses, where n is the number of training samples and b is the batch size. By selecting an optimal variance for the perturbation noise, the overall bound is further refined to $O (n^{- 2/3})$ . For sub-Gaussian loss functions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent