Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
Aur\'elien Decelle, Cyril Furtlehner, Beatriz Seoane

TL;DR
This paper investigates how the mixing time affects the training dynamics of Restricted Boltzmann Machines, revealing two regimes—equilibrium and out-of-equilibrium—that influence sampling efficiency and learning stability.
Contribution
It introduces the concept of mixing time as a key factor in RBM training, characterizes the two regimes, and discusses practical implications for choosing training parameters.
Findings
Mixing time increases with learning, affecting regime transition.
Small k in contrastive divergence leads to slow, out-of-equilibrium dynamics.
Large k promotes equilibrium, faster convergence, and more accurate modeling.
Abstract
Training Restricted Boltzmann Machines (RBMs) has been challenging for a long time due to the difficulty of computing precisely the log-likelihood gradient. Over the past decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time, i.e. the number of Monte Carlo iterations needed to sample new configurations from a model. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, , used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
