Convergence of Contrastive Divergence Algorithm in Exponential Family
Bai Jiang, Tung-Yu Wu, Yifan Jin, Wing H. Wong

TL;DR
This paper proves that the Contrastive Divergence algorithm converges to a consistent estimate of the true parameters in exponential family models under certain conditions, explaining its practical success in training energy-based models.
Contribution
It provides the first rigorous proof of the asymptotic convergence properties of the CD algorithm in exponential families, showing it converges to the MLE under common conditions.
Findings
Limit points of the time-averaged estimates are consistent for the true parameters.
The sequence of estimates forms a Markov chain satisfying Foster-Lyapunov drift conditions.
The convergence rate of the random walk around the MLE is O(1/∛n).
Abstract
The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is computationally-cheap but biased. Whether and why the CD algorithm provides an asymptotically consistent estimate are still open questions. This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model. Suppose the CD algorithm runs MCMC transition steps at each iteration and iteratively generates a sequence of parameter estimates given an i.i.d. data sample $\{X_i\}_{i=1}^n…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Markov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques
