Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences
Viktor Stein, Adwait Datar, Nihat Ay

TL;DR
This paper introduces Wasserstein and Kalman-Wasserstein KL divergences as alternatives to classical KL regularization in reinforcement learning, providing well-posedness under support mismatch and low-noise limits, with demonstrated improvements in control tasks.
Contribution
It develops a unified geometric framework for KL analogues using transport-based geometries, leading to divergences that are finite under support mismatch and improve control regularization.
Findings
Divergences remain finite with support mismatch.
Regularized control problems become well-posed.
Improved control performance in experiments.
Abstract
Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a unified information-geometric framework, we introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based geometries, and we derive closed-form values for common distribution families. These divergences remain finite under support mismatch and yield a geometric interpretation of regularization heuristics used in Kalman ensemble methods. We demonstrate the utility of these divergences in KL-regularized optimal control. In the fully tractable setting of linear time-invariant systems with Gaussian process noise, the classical KL reduces to a quadratic control penalty that becomes singular as process noise vanishes. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
