Information-Theoretic Generalization Bounds for SGLD via Data-Dependent   Estimates

Jeffrey Negrea; Mahdi Haghifam; Gintare Karolina Dziugaite; Ashish; Khisti; Daniel M. Roy

arXiv:1911.02151·stat.ML·January 28, 2020·38 cites

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish, Khisti, Daniel M. Roy

PDF

Open Access 1 Repo

TL;DR

This paper introduces improved data-dependent mutual information bounds for Stochastic Gradient Langevin Dynamics, linking them to flatness of the empirical risk surface and demonstrating significantly tighter bounds than previous methods.

Contribution

It develops novel data-dependent estimates for mutual information bounds in SGLD, enhancing the theoretical understanding of its generalization performance.

Findings

01

Bounds are orders of magnitude smaller than previous gradient-norm-based bounds

02

The approach applies broadly within existing information-theoretic frameworks

03

The bounds relate to the flatness of the empirical risk surface

Abstract

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jnegrea/neurips2019-5904-code
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning