Non-reversible Parallel Tempering for Deep Posterior Approximation
Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin

TL;DR
This paper introduces a non-reversible parallel tempering method that reduces communication costs and improves posterior approximation efficiency in big data scenarios by generalizing existing schemes and using SGD for exploration.
Contribution
It generalizes the deterministic even-odd scheme to promote non-reversibility and proposes solutions to bias, achieving lower communication costs in large-scale data settings.
Findings
Achieves $O(P ext{log} P)$ communication cost in big data scenarios.
Utilizes SGD with large, constant learning rates for efficient exploration.
Effectively approximates complex posteriors with minimal tuning.
Abstract
Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from to given sufficiently many chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis
