Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy   Regularization for Reinforcement Learning

Kyungjae Lee; Sungjoon Choi; Songhwai Oh

arXiv:1709.06293·cs.LG·October 16, 2017

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

Kyungjae Lee, Sungjoon Choi, Songhwai Oh

PDF

TL;DR

This paper introduces a novel sparse MDP framework with causal sparse Tsallis entropy regularization, providing theoretical analysis, a new value iteration method, and demonstrating improved performance in reinforcement learning tasks.

Contribution

It proposes a new sparse MDP model with a unique regularization, along with a convergence proof and superior empirical performance over existing methods.

Findings

01

Sparse MDPs have a constant performance error bound.

02

Sparse MDPs outperform soft MDPs in convergence speed.

03

The proposed method achieves better reinforcement learning results.

Abstract

In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization