Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi and, Songhwai Oh

TL;DR
This paper introduces Tsallis MDPs, a generalized framework for maximum entropy reinforcement learning that uses a tunable entropy measure, enabling better exploration and performance in complex tasks.
Contribution
It proposes Tsallis MDPs with a mathematical analysis, and develops a model-free actor-critic method leveraging Tsallis entropy for improved RL performance.
Findings
Tsallis entropy controls exploration behavior.
Different entropic indices suit different RL problems.
Achieved state-of-the-art results on MuJoCo tasks.
Abstract
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsstochastic dynamics and bifurcation · Reinforcement Learning in Robotics · Neural dynamics and brain function
