On the Natural Gradient of the Evidence Lower Bound
Nihat Ay, Jesse van Oostrum, Adwait Datar

TL;DR
This paper investigates the natural gradient of the ELBO in generative models, revealing that its vanishing gradient aligns ELBO maximization with KL divergence minimization, and introduces the concept of cylindrical models for constrained optimization.
Contribution
It provides a geometric characterization of when ELBO maximization is equivalent to KL divergence minimization under constraints, through the notion of cylindrical models.
Findings
Vanishing natural gradient of ELBO in unconstrained optimization.
ELBO maximization is equivalent to KL divergence minimization.
Introduction of cylindrical models for constrained optimization.
Abstract
This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound (ELBO) which plays a central role in generative machine learning. It reveals that the gap between the evidence and its lower bound, the ELBO, has essentially a vanishing natural gradient within unconstrained optimization. As a result, maximization of the ELBO is equivalent to minimization of the Kullback-Leibler divergence from a target distribution, the primary objective function of learning. Building on this insight, we derive a condition under which this equivalence persists even when optimization is constrained to a model. This condition yields a geometric characterization, which we formalize through the notion of a cylindrical model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
