Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence
Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White

TL;DR
This paper introduces a generalized policy regularization method in reinforcement learning using Tsallis KL divergence, extending traditional KL approaches, and demonstrates its effectiveness through improved performance on Atari games.
Contribution
It proposes a novel reinforcement learning algorithm that incorporates Tsallis KL divergence, generalizing existing methods and showing empirical benefits over standard approaches.
Findings
Tsallis KL divergence generalizes standard KL with a parameter q.
Generalized MVI(q) outperforms standard MVI in Atari games.
Q > 1 can provide benefits in policy learning.
Abstract
Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the -logarithm in the definition. The approach is a strict generalization, as corresponds to the standard KL divergence; provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics
MethodsTrust Region Policy Optimization
