Learning Boltzmann Generators via Constrained Mass Transport
Christopher von Klitzing, Denis Blessing, Henrik Schopmans, Pascal Friederich, Gerhard Neumann

TL;DR
This paper introduces Constrained Mass Transport, a new variational framework for Boltzmann generators that improves sampling efficiency and diversity in high-dimensional, multimodal distributions, especially in molecular systems.
Contribution
The paper proposes Constrained Mass Transport, a novel approach that constrains KL divergence and entropy decay to enhance Boltzmann generator performance and prevent mode collapse.
Findings
CMT outperforms existing methods on standard benchmarks.
Achieves over 2.5x higher effective sample size.
Successfully applied to large molecular system without prior samples.
Abstract
Efficient sampling from high-dimensional and multimodal unnormalized probability distributions is a central challenge in many areas of science and machine learning. We focus on Boltzmann generators (BGs) that aim to sample the Boltzmann distribution of physical systems, such as molecules, at a given temperature. Classical variational approaches that minimize the reverse Kullback-Leibler divergence are prone to mode collapse, while annealing-based methods, commonly using geometric schedules, can suffer from mass teleportation and rely heavily on schedule tuning. We introduce Constrained Mass Transport (CMT), a variational framework that generates intermediate distributions under constraints on both the KL divergence and the entropy decay between successive steps. These constraints enhance distributional overlap, mitigate mass teleportation, and counteract premature convergence. Across…
Peer Reviews
Decision·ICLR 2026 Poster
* The idea of learning an adaptive schedule for the annealing path is conceptually interesting and well-motivated. * The paper is well-written and easy to follow. The mathematical details are clearly presented, and the proofs appear sound as far as I could verify.
* **Unclear specification of the number of annealing steps.** The number of annealing steps ($I$) appears to be a hyperparameter but is never explicitly stated. From Table 7, it seems to be roughly 200 steps for ALDP (400k total / 2000 per step ≈ 200). * This number is quite large, and it is unclear whether the reported gains over TA-BG stem from the adaptive schedule itself or simply from using many more steps. Ablations comparing CMT to TA-BG with an equivalent number of steps would clarif
The papers proposes a reasonably simple protocol to optimize over the KL divergence adding further constraints. The authors derive the optimal distribution of their optimization constraint, very similar to the classical reference Neal 2001. Their experiments seem quite convincing that their method seem to perform better than the others without suffering from mode collapse.
To my opinion, the authors do not provide a clear explanation on why their method should work best w.r.t other annealing scheme. At least, it was not clearly explained why constraining the entropy and the KL divergence should tackle the mass teleportation. It would be a strong added value to have a analytical or intuitive explanation, possibly on a toy example, on why this new optimization schemes avoid this problem.
The paper is clearly written and easy to follow. The motivation for combining trust-region and entropy constraints is well explained, and the authors do a good job connecting these ideas to the practical issues of maintaining overlap and avoiding mode collapse during annealing. While the individual components are known, their combination in this specific setting is original and well justified. The experimental section is strong and demonstrates clear empirical gains. Introducing the ELIL tetrape
From a technical standpoint, the contribution is somewhat incremental. The main novelty is combining two existing constraints (trust-region and entropy regularization) within a single variational framework, rather than introducing a new theoretical ingredient. This does not detract from the paper given the strong empirical results and clear exposition.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Quantum many-body systems · Model Reduction and Neural Networks
