On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex   Learning

Jian Li; Xuanyuan Luo; Mingda Qiao

arXiv:1902.00621·cs.LG·March 3, 2020·24 cites

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

PDF

Open Access

TL;DR

This paper introduces a new framework called Bayes-Stability to derive tighter, data-dependent generalization error bounds for noisy gradient methods like SGLD in non-convex learning, explaining phenomena observed in deep learning.

Contribution

The paper develops the Bayes-Stability framework combining PAC-Bayesian theory and stability to obtain improved generalization bounds for noisy gradient algorithms in non-convex settings.

Findings

01

New data-dependent bounds for SGLD and similar methods

02

Bounds can distinguish between random and normal data

03

Improved bounds for Langevin dynamics with regularization

Abstract

Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning