Generalization Bounds for Gradient Methods via Discrete and Continuous   Prior

Xuanyuan Luo; Luo Bei; Jian Li

arXiv:2205.13799·cs.LG·October 12, 2022

Generalization Bounds for Gradient Methods via Discrete and Continuous Prior

Xuanyuan Luo, Luo Bei, Jian Li

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel discrete data-dependent prior within the PAC-Bayesian framework to derive generalization bounds for gradient methods, applicable to nonconvex and nonsmooth scenarios, with promising theoretical and numerical results.

Contribution

It develops a new discrete prior approach for PAC-Bayesian bounds, extending analysis to nonconvex, nonsmooth, and Langevin dynamics, improving understanding of algorithm-dependent generalization.

Findings

01

Provides high probability bounds for Floored GD and GLD.

02

Achieves a $0.037$ testing error bound on MNIST.

03

Introduces bounds applicable to nonconvex and nonsmooth optimization.

Abstract

Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics). In this paper, we introduce a new discrete data-dependent prior to the PAC-Bayesian framework, and prove a high probability generalization bound of order $O (\frac{1}{n} \cdot \sum_{t = 1}^{T} (γ_{t} / ε_{t})^{2} ∥ g_{t} ∥^{2})$ for Floored GD (i.e. a version of gradient descent with precision level $ε_{t}$ ), where $n$ is the number of training samples, $γ_{t}$ is the learning rate at step $t$ , $g_{t}$ is roughly the difference of the gradient computed using all samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generalization Bounds for Gradient Methods via Discrete and Continuous Prior· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent