Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines
Guillaume Desjardins, Razvan Pascanu, Aaron Courville, Yoshua, Bengio

TL;DR
The paper proposes the Metric-Free Natural Gradient algorithm for more efficient training of Boltzmann Machines, avoiding explicit metric computation and showing faster convergence per epoch in joint-training tasks.
Contribution
It introduces a novel natural gradient method that avoids explicit metric calculation, improving convergence speed in training Boltzmann Machines.
Findings
Faster per-epoch convergence compared to Stochastic Maximum Likelihood
Efficient matrix-vector product avoids explicit metric storage
Wall-clock performance currently not competitive
Abstract
This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric . This metric is shown to be the expected second derivative of the log-partition function (under the model distribution), or equivalently, the variance of the vector of partial derivatives of the energy function. We evaluate our method on the task of joint-training a 3-layer Deep Boltzmann Machine and show that MFNG does indeed have faster per-epoch convergence compared to Stochastic Maximum Likelihood with centering, though wall-clock performance is currently not competitive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
