On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
Jian Li, Xuanyuan Luo, Mingda Qiao

TL;DR
This paper introduces a new framework called Bayes-Stability to derive tighter, data-dependent generalization error bounds for noisy gradient methods like SGLD in non-convex learning, explaining phenomena observed in deep learning.
Contribution
The paper develops the Bayes-Stability framework combining PAC-Bayesian theory and stability to obtain improved generalization bounds for noisy gradient algorithms in non-convex settings.
Findings
New data-dependent bounds for SGLD and similar methods
Bounds can distinguish between random and normal data
Improved bounds for Langevin dynamics with regularization
Abstract
Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
