Accelerated Parallel Tempering via Neural Transports

Leo Zhang; Peter Potaptchik; Jiajun He; Yuanqi Du; Arnaud Doucet; Francisco Vargas; Hai-Dang Dau; Saifuddin Syed

arXiv:2502.10328·stat.ML·March 26, 2026

Accelerated Parallel Tempering via Neural Transports

Leo Zhang, Peter Potaptchik, Jiajun He, Yuanqi Du, Arnaud Doucet, Francisco Vargas, Hai-Dang Dau, Saifuddin Syed

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a neural transport framework to accelerate Parallel Tempering MCMC, improving sampling efficiency for complex distributions while maintaining theoretical guarantees.

Contribution

It proposes integrating neural samplers into Parallel Tempering to reduce overlap requirements and computational costs, a novel approach in MCMC acceleration.

Findings

01

Improves sample quality in multimodal problems.

02

Reduces computational cost compared to classical PT.

03

Enables efficient free energy estimation.

Abstract

Markov Chain Monte Carlo (MCMC) algorithms are essential tools in computational statistics for sampling from unnormalised probability distributions, but can be fragile when targeting high-dimensional, multimodal, or complex target distributions. Parallel Tempering (PT) enhances MCMC's sample efficiency through annealing and parallel computation, propagating samples from tractable reference distributions to intractable targets via state swapping across interpolating distributions. The effectiveness of PT is limited by the often minimal overlap between adjacent distributions in challenging problems, which requires increasing the computational resources to compensate. We introduce a framework that accelerates PT by leveraging neural samplers -- including normalising flows, diffusion models, and controlled diffusions -- to reduce the required overlap. Our approach utilises neural samplers…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 5

Strengths

1. Math derivation is sound. 2. PT is known to be an effective sampler for multi-modal simulations. If diffusion-enhanced sampler really works, PT will surely boost the perfomance of multi-modal simulation.

Weaknesses

1. I don't like the whole community that utilizes ideas like the diffusion model to do sampling, which is super expensive and doesn't make a lot of sense. I have extensive research experience in sampling and diffusion models but I don't this is a nice combination. 2. The motivation why do and when do we need diffusion models to do sampling is not well-supported. The intuition why a backward process is needed is not explained clearly. 3. for section 6.1, measuring the round trip is not a good

Reviewer 02Rating 2Confidence 4

Strengths

* The paper addresses an important problem and demonstrates clear improvements in round-trip rates for parallel tempering.

Weaknesses

* The paper is very poorly written. The presentation based on the Jarzynski framework makes it difficult to follow. A formulation directly grounded in the Metropolis–Hastings framework, with clearly defined target distribution and proposal kernel (see [1]), would greatly improve readability and conceptual clarity. * The novelty is rather limited. The deterministic case has already been covered in [2] (as acknowledged by the authors), while the stochastic variant represents only a modest generali

Reviewer 03Rating 8Confidence 4

Strengths

- Although related to recent literature and not unexpected in this respect, the proposed algorithm is novel and the authors do a great job at presenting in general the method before showcasing different possible applications with different types of generative modeling ideas. - The article also does a great job at connecting its method to the adjacent literature. - The method is justified by proofs of consistency. I have not read in details the proofs, but the result appear reasonable and the a

Weaknesses

- It would be desirable to insist more on the question of the training of the neural networks in the main text and on the fact that it is not a trivial question in this sampling setting. For instance, for NF-APT, the authors state in Appendix C.1.2 that the retained strategy is to first run PT to be able to train the flows, which is arguably an important limitation. - The introduction is not always fair to the adjacent literature - line 072 - “However, these methods usually incur a bias, fore

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · DNA and Biological Computing · Algorithms and Data Compression