Generalisation under gradient descent via deterministic PAC-Bayes

Eugenio Clerico; Tyler Farghly; George Deligiannidis and; Benjamin Guedj; Arnaud Doucet

arXiv:2209.02525·stat.ML·February 12, 2025·1 cites

Generalisation under gradient descent via deterministic PAC-Bayes

Eugenio Clerico, Tyler Farghly, George Deligiannidis and, Benjamin Guedj, Arnaud Doucet

PDF

Open Access 4 Reviews

TL;DR

This paper develops new deterministic PAC-Bayesian generalisation bounds for models trained with gradient descent and related algorithms, providing fully computable guarantees that depend on initial conditions and the training trajectory.

Contribution

It introduces disintegrated PAC-Bayesian bounds applicable to deterministic optimization algorithms without de-randomisation, extending theoretical understanding of generalisation in gradient-based training.

Findings

01

Bounds are applicable to SGD, momentum, and Hamiltonian dynamics.

02

Results depend on initial distribution density and Hessian along training trajectory.

03

Framework provides fully computable generalisation guarantees.

Abstract

We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.

Peer Reviews

Decision·ALT 2025

Reviewer 01Rating · AcceptConfidence 4

Reviewer 02Rating 6Confidence 3

Reviewer 03Rating 7Confidence 4

Reviewer 04Rating 6Confidence 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Machine Learning and Algorithms