Geometric Value Iteration: Dynamic Error-Aware KL Regularization for   Reinforcement Learning

Toshinori Kitamura; Lingwei Zhu; Takamitsu Matsubara

arXiv:2107.07659·cs.LG·October 6, 2021

Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

Toshinori Kitamura, Lingwei Zhu, Takamitsu Matsubara

PDF

Open Access

TL;DR

This paper introduces Geometric Value Iteration (GVI), a novel reinforcement learning algorithm that adaptively tunes KL regularization to improve robustness and stability, especially when using deep networks.

Contribution

It presents the first asymptotic error bound for dynamic KL coefficient schemes and proposes GVI, which dynamically adjusts regularization to balance learning speed and robustness.

Findings

01

GVI outperforms fixed-coefficient methods in stability and convergence.

02

Dynamic error-aware KL tuning improves robustness in deep RL.

03

GVI maintains stable learning without target networks.

Abstract

The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Traffic control and management