Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy   Reinforcement Learning

Kyungjae Lee; Sungyub Kim; Sungbin Lim; Sungjoon Choi and; Songhwai Oh

arXiv:1902.00137·cs.LG·February 8, 2019·19 cites

Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi and, Songhwai Oh

PDF

Open Access

TL;DR

This paper introduces Tsallis MDPs, a generalized framework for maximum entropy reinforcement learning that uses a tunable entropy measure, enabling better exploration and performance in complex tasks.

Contribution

It proposes Tsallis MDPs with a mathematical analysis, and develops a model-free actor-critic method leveraging Tsallis entropy for improved RL performance.

Findings

01

Tsallis entropy controls exploration behavior.

02

Different entropic indices suit different RL problems.

03

Achieved state-of-the-art results on MuJoCo tasks.

Abstract

In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsstochastic dynamics and bifurcation · Reinforcement Learning in Robotics · Neural dynamics and brain function