Fast training and sampling of Restricted Boltzmann Machines
Nicolas B\'ereux, Aur\'elien Decelle, Cyril Furtlehner, Lorenzo Rosset, Beatriz Seoane

TL;DR
This paper introduces a novel training and sampling approach for Restricted Boltzmann Machines that leverages phase transition theory, enabling faster training, more efficient sampling, and better handling of structured data.
Contribution
It presents a new pre-training method, a smooth annealing trajectory, and the Parallel Trajectory Tempering sampling strategy, improving efficiency and accuracy in RBM training and sampling.
Findings
Pre-training with low-rank RBMs improves handling of structured datasets.
PTT outperforms existing MCMC methods in sampling speed.
The approach enables reliable log-likelihood estimation during training.
Abstract
Restricted Boltzmann Machines (RBMs) are powerful tools for modeling complex systems and extracting insights from data, but their training is hindered by the slow mixing of Markov Chain Monte Carlo (MCMC) processes, especially with highly structured datasets. In this study, we build on recent theoretical advances in RBM training and focus on the stepwise encoding of data patterns into singular vectors of the coupling matrix, significantly reducing the cost of generating new samples and evaluating the quality of the model, as well as the training cost in highly clustered datasets. The learning process is analogous to the thermodynamic continuous phase transitions observed in ferromagnetic models, where new modes in the probability measure emerge in a continuous manner. We leverage the continuous transitions in the training process to define a smooth annealing trajectory that enables…
Peer Reviews
Decision·ICLR 2025 Poster
RBMs are an important model and finding appropriate ways to train them is a topic of significant interest. The paper highlights the phenomenon of critical slowing down and how pre-training the model with a low-rank approximation of the parameter matrix can help the model overcome some of the slowing down effects.
The paper suffers from a lack of clarity of presentation and lack of clarity of novelty. The paper mentions that the idea of a low-rank approach has already been used by others and it's unclear to me what novelty there is in any of the sampling approaches used after the pre-training phase. In terms of presentation, there are notational inconsistencies and a general lack of clarity in terms of the main ideas. Fundamentally the approach of fitting a constrained model seems straightforward and in
- The paper is well-written and easy to follow. - It represents a pleasant read that is accessible to a broad audience. - The literature review and related work section read well and are exhaustive. - The idea of pre-training the RBM to encode the principal components is simple yet very effective. - Leveraging the analogy between critical slowing down and the struggle of RBM during training to be ergodic and discovering all modes of the distributions is elegant and intuitive (though I suppose
- I find it a bit challenging to identify the two main contributions in the paper as those are totally disentangled in their presentation between Sec. 4 and Sec. 5.2. I strongly recommend adding a list of bullet points at the end of section 1 to clearly list the contributions of work and crossref to the corresponding point in the paper. This would substantially help navigate the paper. - I find that the structure of sections 5.2 and 5.2.1 can be improved. In particular, I find it confusing tha
- The paper offers a novel contribution by proposing a pre-training technique and a new sampling approach for RBMs inspired by their thermodynamic properties. This builds on the existing theoretical analyses of RBMs. - To my knowledge, extending replica Monte Carlo methods to a learning trajectory is original and intriguing. - Including a specialized physics background in the Appendix makes the paper accessible even to readers without a physics background.
The distinction between theoretical claims and empirical findings is not clear. It would be beneficial for the authors to clarify which parts of the study are based on theoretical analysis and which are supported by numerical experiments, particularly in the context of related work. For instance, the first- and second-order phase transition claims pertain to equilibrium properties. However, it is unclear how these phase transitions are justified when updating parameters with limited samples. -
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Lattice Boltzmann Simulation Studies
