# On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based   Models

**Authors:** Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu

arXiv: 1903.12370 · 2019-12-02

## TL;DR

This paper explores the effects of MCMC sampling in training energy-based models with ConvNet potentials, revealing that stable training and realistic long-run samples are achievable with proper tuning, challenging previous assumptions about MCMC stability.

## Contribution

It demonstrates that stable ML training of ConvNet energy models is possible without regularization and that long-run MCMC samples can be realistic with correct noise tuning.

## Key findings

- Short-run Langevin sampling produces realistic images.
- ML training can be stable with minimal hyper-parameters.
- Proper noise tuning enables realistic long-run MCMC samples.

## Abstract

This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family of unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. We find that many of the techniques used to stabilize training in previous studies are not necessary. ML learning with a ConvNet potential requires only a few hyper-parameters and no regularization. Using this minimal framework, we identify a variety of ML learning outcomes that depend solely on the implementation of MCMC sampling.   On one hand, we show that it is easy to train an energy-based model which can sample realistic images with short-run Langevin. ML can be effective and stable even when MCMC samples have much higher energy than true steady-state samples throughout training. Based on this insight, we introduce an ML method with purely noise-initialized MCMC, high-quality short-run synthesis, and the same budget as ML with informative MCMC initialization such as CD or PCD. Unlike previous models, our energy model can obtain realistic high-diversity samples from a noise signal after training.   On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images. We show that it is much harder to train a ConvNet potential to learn a steady-state over realistic images. To our knowledge, long-run MCMC samples of all previous models lose the realism of short-run samples. With correct tuning of Langevin noise, we train the first ConvNet potentials for which long-run and steady-state MCMC samples are realistic images.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.12370/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1903.12370/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1903.12370/full.md

---
Source: https://tomesphere.com/paper/1903.12370