Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes
David M. Bossens

TL;DR
This paper introduces two novel algorithms, RCPG with Robust Lagrangian and Adversarial RCPG, to improve robustness and incremental learning in constrained Markov decision processes, demonstrating superior performance in empirical tests.
Contribution
The paper proposes two new algorithms that enhance robustness and incremental learning in RCMDPs by reformulating the worst-case dynamics based on the Lagrangian and learning adversarial policies incrementally.
Findings
Both algorithms outperform traditional RCPG variants.
Adversarial RCPG ranks among the top two in all tests.
Algorithms show robustness in inventory and navigation tasks.
Abstract
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Fault Detection and Control Systems
