Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

Guillaume Desjardins; Razvan Pascanu; Aaron Courville; Yoshua; Bengio

arXiv:1301.3545·cs.LG·March 19, 2013·ICLR·19 cites

Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

Guillaume Desjardins, Razvan Pascanu, Aaron Courville, Yoshua, Bengio

PDF

Open Access

TL;DR

The paper proposes the Metric-Free Natural Gradient algorithm for more efficient training of Boltzmann Machines, avoiding explicit metric computation and showing faster convergence per epoch in joint-training tasks.

Contribution

It introduces a novel natural gradient method that avoids explicit metric calculation, improving convergence speed in training Boltzmann Machines.

Findings

01

Faster per-epoch convergence compared to Stochastic Maximum Likelihood

02

Efficient matrix-vector product avoids explicit metric storage

03

Wall-clock performance currently not competitive

Abstract

This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric $L$ . This metric is shown to be the expected second derivative of the log-partition function (under the model distribution), or equivalently, the variance of the vector of partial derivatives of the energy function. We evaluate our method on the task of joint-training a 3-layer Deep Boltzmann Machine and show that MFNG does indeed have faster per-epoch convergence compared to Stochastic Maximum Likelihood with centering, though wall-clock performance is currently not competitive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications