Relation between the Kantorovich-Wasserstein metric and the Kullback-Leibler divergence
Roman V. Belavkin

TL;DR
This paper explores the mathematical relationship between the Kantorovich-Wasserstein metric and the Kullback-Leibler divergence, revealing their connection through optimal transport and variational principles.
Contribution
It establishes a link between OTP and OCP with KL constraints, providing a geometric and variational perspective on the relation between KW and KL.
Findings
OTP is equivalent to OCP with fixed output measure.
KL-divergence-based constraints give lower bounds on the KW-metric.
Decomposition of KL-divergence relates to the law of cosines in a geometric framework.
Abstract
We discuss a relation between the Kantorovich-Wasserstein (KW) metric and the Kullback-Leibler (KL) divergence. The former is defined using the optimal transport problem (OTP) in the Kantorovich formulation. The latter is used to define entropy and mutual information, which appear in variational problems to find optimal channel (OCP) from the rate distortion and the value of information theories. We show that OTP is equivalent to OCP with one additional constraint fixing the output measure, and therefore OCP with constraints on the KL-divergence gives a lower bound on the KW-metric. The dual formulation of OTP allows us to explore the relation between the KL-divergence and the KW-metric using decomposition of the former based on the law of cosines. This way we show the link between two divergences using the variational and geometric principles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
