Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
Reazul Hasan Russel, Mouhacine Benosman, Jeroen Van Baar

TL;DR
This paper introduces robust constrained Markov decision processes (RCMDPs) to enhance reinforcement learning algorithms with performance and safety guarantees under model uncertainty, especially useful for real-world applications like Sim2Real transfer.
Contribution
The paper merges CMDP and RMDP theories to formulate RCMDPs, enabling the design of robust RL algorithms with constraint satisfaction guarantees under model uncertainty.
Findings
Proposed a Lagrangian-based robust policy gradient algorithm.
Validated the approach on an inventory management problem.
Demonstrated robustness and safety guarantees in uncertain environments.
Abstract
In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Auction Theory and Applications
