Adaptive Symmetrization of the KL Divergence
Omri Ben-Dov, Luiz F.O. Chamon

TL;DR
This paper introduces a non-adversarial method to minimize the symmetric Jeffreys divergence by jointly fitting a main and proxy model, improving stability and accuracy over traditional methods like MLE and GANs.
Contribution
It proposes a novel, practical algorithm that adaptively minimizes the Jeffreys divergence without adversarial training, using a proxy model for tractable optimization.
Findings
More stable training compared to GANs
Higher accuracy in density estimation tasks
Effective in low-data regimes
Abstract
The forward Kullback-Leibler (KL) divergence is a ubiquitous objective for fitting a parameterized distribution to samples due to its tractability and equivalence to maximum likelihood estimation (MLE). Its inherent asymmetry, however, may lead to degenerate solutions that generalize poorly. While the symmetric Jeffreys divergence offers a more balanced alternative, its optimization is challenging due to the presence of a reverse KL term. Generative adversarial networks (GANs) bypass this intractability using a min-max formulation at the cost of introducing new instability issues. This work proposes a non-adversarial approach to minimize the Jeffreys divergence. To do so, it uses a proxy model to tractably approximate the reverse KL divergence of the main model. The main and proxy models are jointly fitted to the data using a constrained optimization formulation to obtain a practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
