Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning
Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara

TL;DR
This paper introduces Tsallis Advantage Learning (TAL), a method that enforces KL regularization in Tsallis entropy reinforcement learning, improving robustness and performance over existing approaches, and achieving competitive results with Shannon entropy methods.
Contribution
The paper proposes TAL, a novel approach that incorporates KL regularization into Tsallis entropy RL, addressing approximation errors and enhancing empirical performance.
Findings
TAL significantly outperforms Tsallis-DQN on various non-closed-form Tsallis entropies.
TAL achieves performance comparable to state-of-the-art Shannon entropy algorithms.
Enforcing KL regularization improves robustness and generalization in Tsallis entropy RL.
Abstract
Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics
MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network
