Generalizing Hamiltonian Monte Carlo with Neural Networks

Daniel Levy; Matthew D. Hoffman; Jascha Sohl-Dickstein

arXiv:1711.09268·stat.ML·March 6, 2018·25 cites

Generalizing Hamiltonian Monte Carlo with Neural Networks

Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

PDF

Open Access 3 Repos

TL;DR

This paper introduces a neural network-based extension of Hamiltonian Monte Carlo that significantly improves sampling efficiency and mixing speed across various challenging distributions and real-world tasks.

Contribution

It proposes a novel neural network parameterization for MCMC kernels that generalizes HMC and maximizes mixing speed, with demonstrated empirical improvements.

Findings

01

Achieved up to 106x increase in effective sample size.

02

Enabled mixing where standard HMC fails.

03

Showed benefits on real-world latent-variable modeling.

Abstract

We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference