Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry

TL;DR
This paper demonstrates that the generalization gap during training with Langevin dynamics depends primarily on temperature and initial loss, not on training duration or model complexity, using a thermodynamic perspective.
Contribution
It provides a novel, simple bound on the generalization gap for Langevin dynamics that is independent of training time, mixing, and model properties, based on thermodynamic principles.
Findings
Generalization gap bound depends on temperature and initial loss
No dependence on training time or model dimensionality
Bound holds for any Markov process with Gibbs stationary distribution
Abstract
We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution . We focus on Langevin dynamics with a positive temperature , i.e. gradient descent on a training loss with infinitesimal step size, perturbed with -variances Gaussian noise, and lightly regularized or bounded. There, we bound the generalization gap, at any time during training, by with probability over the dataset, where is the sample size, and with standard initialization scaling. In contrast to previous guarantees, we have no dependence on either training time or reliance on mixing, nor a dependence on dimensionality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsProtein Structure and Dynamics · Gene Regulatory Network Analysis · Complex Network Analysis Techniques
MethodsFocus
