Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
Xuanyuan Luo, Luo Bei, Jian Li

TL;DR
This paper introduces a novel discrete data-dependent prior within the PAC-Bayesian framework to derive generalization bounds for gradient methods, applicable to nonconvex and nonsmooth scenarios, with promising theoretical and numerical results.
Contribution
It develops a new discrete prior approach for PAC-Bayesian bounds, extending analysis to nonconvex, nonsmooth, and Langevin dynamics, improving understanding of algorithm-dependent generalization.
Findings
Provides high probability bounds for Floored GD and GLD.
Achieves a $0.037$ testing error bound on MNIST.
Introduces bounds applicable to nonconvex and nonsmooth optimization.
Abstract
Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics). In this paper, we introduce a new discrete data-dependent prior to the PAC-Bayesian framework, and prove a high probability generalization bound of order for Floored GD (i.e. a version of gradient descent with precision level ), where is the number of training samples, is the learning rate at step , is roughly the difference of the gradient computed using all samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
