Robust Entropy-regularized Markov Decision Processes
Tien Mai, Patrick Jaillet

TL;DR
This paper introduces a robust entropy-regularized Markov decision process model that enhances policy stability under transition probability uncertainties, integrating robustness into existing RL algorithms.
Contribution
It develops a tractable robust ER-MDP framework combining entropy regularization with transition ambiguity, extending properties of non-robust models to this new setting.
Findings
Robust ER-MDP retains key properties of non-robust models.
Framework can be integrated into value and policy iteration algorithms.
Provides complexity analysis and error bounds for the proposed methods.
Abstract
Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with respect to the state transition probabilities, and the estimation of these probabilities may be inaccurate, we study a robust version of the ER-MDP model, where the stochastic optimal policies are required to be robust with respect to the ambiguity in the underlying transition probabilities. Our work is at the crossroads of two important schemes in reinforcement learning (RL), namely, robust MDP and entropy regularized MDP. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings, making the robust ER-MDP problem tractable. We show how our framework and results can be integrated into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
