# Non-Reversible Parallel Tempering: a Scalable Highly Parallel MCMC   Scheme

**Authors:** Saifuddin Syed, Alexandre Bouchard-C\^ot\'e, George Deligiannidis,, Arnaud Doucet

arXiv: 1905.02939 · 2021-07-28

## TL;DR

This paper introduces a non-reversible parallel tempering method for MCMC that outperforms traditional reversible schemes, providing theoretical insights, optimal scheduling, and practical algorithms for sampling complex distributions.

## Contribution

It formalizes the distinction between reversible and non-reversible PT schemes, demonstrating the superiority of non-reversible methods and developing an optimal annealing schedule.

## Key findings

- Non-reversible PT dominates reversible PT in performance.
- Scaling limits differ: non-reversible is piecewise-deterministic, reversible is diffusive.
- Numerical examples validate theoretical and methodological advances.

## Abstract

Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to sample complex high-dimensional probability distributions. They rely on a collection of $N$ interacting auxiliary chains targeting tempered versions of the target distribution to improve the exploration of the state-space. We provide here a new perspective on these highly parallel algorithms and their tuning by identifying and formalizing a sharp divide in the behaviour and performance of reversible versus non-reversible PT schemes. We show theoretically and empirically that a class of non-reversible PT methods dominates its reversible counterparts and identify distinct scaling limits for the non-reversible and reversible schemes, the former being a piecewise-deterministic Markov process and the latter a diffusion. These results are exploited to identify the optimal annealing schedule for non-reversible PT and to develop an iterative scheme approximating this schedule. We provide a wide range of numerical examples supporting our theoretical and methodological contributions. The proposed methodology is applicable to sample from a distribution $\pi$ with a density $L$ with respect to a reference distribution $\pi_0$ and compute the normalizing constant. A typical use case is when $\pi_0$ is a prior distribution, $L$ a likelihood function and $\pi$ the corresponding posterior.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.02939/full.md

## Figures

87 figures with captions in the complete paper: https://tomesphere.com/paper/1905.02939/full.md

## References

74 references — full list in the complete paper: https://tomesphere.com/paper/1905.02939/full.md

---
Source: https://tomesphere.com/paper/1905.02939