Non-reversible Parallel Tempering for Deep Posterior Approximation

Wei Deng; Qian Zhang; Qi Feng; Faming Liang; Guang Lin

arXiv:2211.10837·cs.LG·November 22, 2022

Non-reversible Parallel Tempering for Deep Posterior Approximation

Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin

PDF

Open Access 1 Video

TL;DR

This paper introduces a non-reversible parallel tempering method that reduces communication costs and improves posterior approximation efficiency in big data scenarios by generalizing existing schemes and using SGD for exploration.

Contribution

It generalizes the deterministic even-odd scheme to promote non-reversibility and proposes solutions to bias, achieving lower communication costs in large-scale data settings.

Findings

01

Achieves $O(P ext{log} P)$ communication cost in big data scenarios.

02

Utilizes SGD with large, constant learning rates for efficient exploration.

03

Effectively approximates complex posteriors with minimal tuning.

Abstract

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O (P^{2})$ to $O (P)$ given sufficiently many $P$ chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost $O (P lo g P)$ based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Non-reversible Parallel Tempering for Deep Posterior Approximation· underline

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis