Flipping-based Policy for Chance-Constrained Markov Decision Processes
Xun Shen, Shuo Jiang, Akifumi Wachi, Kaumune Hashimoto, Sebastien Gros

TL;DR
This paper introduces a flipping-based policy for chance-constrained Markov decision processes, providing a novel approach to safe reinforcement learning that effectively manages safety under uncertainty and improves existing algorithms.
Contribution
It proposes a new flipping-based policy framework for CCMDPs, establishes a Bellman equation, and demonstrates its effectiveness in safe RL benchmarks.
Findings
The flipping-based policy exists within the optimal solution set for CCMDPs.
Chance constraints can be approximated by ECSCs, enabling practical implementation.
The framework improves safe RL algorithm performance on Safety Gym benchmarks.
Abstract
Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state. We establish a Bellman equation for CCMDPs and further prove the existence of a flipping-based policy within the optimal solution sets. Since solving the problem with joint chance constraints is challenging in practice, we then prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSimulation Techniques and Applications · Complex Systems and Decision Making · Bayesian Modeling and Causal Inference
MethodsFLIP
