Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Itamar Harel; Yonathan Wolanowsky; Gal Vardi; Nathan Srebro; Daniel Soudry

arXiv:2505.19087·cs.LG·October 21, 2025

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that the generalization gap during training with Langevin dynamics depends primarily on temperature and initial loss, not on training duration or model complexity, using a thermodynamic perspective.

Contribution

It provides a novel, simple bound on the generalization gap for Langevin dynamics that is independent of training time, mixing, and model properties, based on thermodynamic principles.

Findings

01

Generalization gap bound depends on temperature and initial loss

02

No dependence on training time or model dimensionality

03

Bound holds for any Markov process with Gibbs stationary distribution

Abstract

We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $θ_{0} \sim p_{0}$ . We focus on Langevin dynamics with a positive temperature $β^{- 1}$ , i.e. gradient descent on a training loss $L$ with infinitesimal step size, perturbed with $β^{- 1}$ -variances Gaussian noise, and lightly regularized or bounded. There, we bound the generalization gap, at any time during training, by $(β E L (θ_{0}) + lo g (1/ δ)) / N$ with probability $1 - δ$ over the dataset, where $N$ is the sample size, and $E L (θ_{0}) = O (1)$ with standard initialization scaling. In contrast to previous guarantees, we have no dependence on either training time or reliance on mixing, nor a dependence on dimensionality,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes· slideslive

Taxonomy

TopicsProtein Structure and Dynamics · Gene Regulatory Network Analysis · Complex Network Analysis Techniques

MethodsFocus