Ratio Divergence Learning Using Target Energy in Restricted Boltzmann Machines: Beyond Kullback--Leibler Divergence Learning
Yuichi Ishida, Yuma Ichikawa, Aki Dote, Toshiyuki Miyazawa, Koji Hukushima

TL;DR
This paper introduces ratio divergence (RD) learning for discrete energy-based models, especially RBMs, combining forward and reverse KLD to improve training stability, mode coverage, and energy fitting beyond traditional methods.
Contribution
The paper proposes a novel RD learning method that effectively combines forward and reverse KLD for training RBMs, addressing key issues like underfitting and mode-collapse.
Findings
RD learning outperforms existing methods in energy fitting
It improves mode coverage and training stability
Performance gains increase with model complexity
Abstract
We propose ratio divergence (RD) learning for discrete energy-based models, a method that utilizes both training data and a tractable target energy function. We apply RD learning to restricted Boltzmann machines (RBMs), which are a minimal model that satisfies the universal approximation theorem for discrete distributions. RD learning combines the strength of both forward and reverse Kullback-Leibler divergence (KLD) learning, effectively addressing the "notorious" issues of underfitting with the forward KLD and mode-collapse with the reverse KLD. Since the summation of forward and reverse KLD seems to be sufficient to combine the strength of both approaches, we include this learning method as a direct baseline in numerical experiments to evaluate its effectiveness. Numerical experiments demonstrate that RD learning significantly outperforms other learning methods in terms of energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
