Generalization of the Gibbs algorithm with high probability at low temperatures
Andreas Maurer

TL;DR
This paper provides a probabilistic bound on the generalization error of the Gibbs algorithm across temperature ranges, emphasizing the role of the loss landscape and flat minima, with implications for stochastic algorithms.
Contribution
It extends existing bounds to low temperatures, linking generalization to the data-dependent loss landscape and prior volume, supporting the importance of flat minima.
Findings
High probability bounds on generalization error at low temperatures
Generalization depends on the data-dependent loss landscape
Supports the benefit of flat minima in optimization
Abstract
The paper gives a bound on the generalization error of the Gibbs algorithm, which recovers known data-independent bounds for the high temperature range and extends to the low-temperature range, where generalization depends critically on the data-dependent loss-landscape. It is shown, that with high probability the generalization error of a single hypothesis drawn from the Gibbs posterior decreases with the total prior volume of all hypotheses with similar or smaller empirical error. This gives theoretical support to the belief in the benefit of flat minima. The zero temperature limit is discussed and the bound is extended to a class of similar stochastic algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
