Enforcing KL Regularization in General Tsallis Entropy Reinforcement   Learning via Advantage Learning

Lingwei Zhu; Zheng Chen; Eiji Uchibe; Takamitsu Matsubara

arXiv:2205.07885·cs.LG·May 18, 2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara

PDF

Open Access

TL;DR

This paper introduces Tsallis Advantage Learning (TAL), a method that enforces KL regularization in Tsallis entropy reinforcement learning, improving robustness and performance over existing approaches, and achieving competitive results with Shannon entropy methods.

Contribution

The paper proposes TAL, a novel approach that incorporates KL regularization into Tsallis entropy RL, addressing approximation errors and enhancing empirical performance.

Findings

01

TAL significantly outperforms Tsallis-DQN on various non-closed-form Tsallis entropies.

02

TAL achieves performance comparable to state-of-the-art Shannon entropy algorithms.

03

Enforcing KL regularization improves robustness and generalization in Tsallis entropy RL.

Abstract

Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics

MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network