Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical   Viewpoints

Wenlong Mou; Liwei Wang; Xiyu Zhai; Kai Zheng

arXiv:1707.05947·cs.LG·July 20, 2017·55 cites

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Wenlong Mou, Liwei Wang, Xiyu Zhai, Kai Zheng

PDF

Open Access

TL;DR

This paper provides two theoretical bounds on the generalization error of SGLD in non-convex learning, highlighting how limited iterations and step sizes influence model generalization, especially in deep learning.

Contribution

It introduces two non-asymptotic theories based on Stability and PAC-Bayesian analysis for SGLD, with bounds independent of model capacity measures.

Findings

01

Stability-based bound: $O(1/n)L\sqrt{eta T_k}$

02

PAC-Bayesian bound: $O(1/\sqrt{n})$ with exponential decay factors

03

Bounds imply fast training guarantees generalization in non-convex models

Abstract

Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O (\frac{1}{n} L β T_{k})$ , where $L$ is uniform Lipschitz parameter, $β$ is inverse temperature, and $T_{k}$ is aggregated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques