Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Kyungjae Lee, Sungjoon Choi, Songhwai Oh

TL;DR
This paper introduces a novel sparse MDP framework with causal sparse Tsallis entropy regularization, providing theoretical analysis, a new value iteration method, and demonstrating improved performance in reinforcement learning tasks.
Contribution
It proposes a new sparse MDP model with a unique regularization, along with a convergence proof and superior empirical performance over existing methods.
Findings
Sparse MDPs have a constant performance error bound.
Sparse MDPs outperform soft MDPs in convergence speed.
The proposed method achieves better reinforcement learning results.
Abstract
In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization
