Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
Wenlong Mou, Liwei Wang, Xiyu Zhai, Kai Zheng

TL;DR
This paper provides two theoretical bounds on the generalization error of SGLD in non-convex learning, highlighting how limited iterations and step sizes influence model generalization, especially in deep learning.
Contribution
It introduces two non-asymptotic theories based on Stability and PAC-Bayesian analysis for SGLD, with bounds independent of model capacity measures.
Findings
Stability-based bound: $O(1/n)L\sqrt{eta T_k}$
PAC-Bayesian bound: $O(1/\sqrt{n})$ with exponential decay factors
Bounds imply fast training guarantees generalization in non-convex models
Abstract
Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of , where is uniform Lipschitz parameter, is inverse temperature, and is aggregated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
