Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning
Toshinori Kitamura, Lingwei Zhu, Takamitsu Matsubara

TL;DR
This paper introduces Geometric Value Iteration (GVI), a novel reinforcement learning algorithm that adaptively tunes KL regularization to improve robustness and stability, especially when using deep networks.
Contribution
It presents the first asymptotic error bound for dynamic KL coefficient schemes and proposes GVI, which dynamically adjusts regularization to balance learning speed and robustness.
Findings
GVI outperforms fixed-coefficient methods in stability and convergence.
Dynamic error-aware KL tuning improves robustness in deep RL.
GVI maintains stable learning without target networks.
Abstract
The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Traffic control and management
