Flipping-based Policy for Chance-Constrained Markov Decision Processes

Xun Shen; Shuo Jiang; Akifumi Wachi; Kaumune Hashimoto; Sebastien Gros

arXiv:2410.06474·cs.LG·October 10, 2024

Flipping-based Policy for Chance-Constrained Markov Decision Processes

Xun Shen, Shuo Jiang, Akifumi Wachi, Kaumune Hashimoto, Sebastien Gros

PDF

Open Access 1 Video

TL;DR

This paper introduces a flipping-based policy for chance-constrained Markov decision processes, providing a novel approach to safe reinforcement learning that effectively manages safety under uncertainty and improves existing algorithms.

Contribution

It proposes a new flipping-based policy framework for CCMDPs, establishes a Bellman equation, and demonstrates its effectiveness in safe RL benchmarks.

Findings

01

The flipping-based policy exists within the optimal solution set for CCMDPs.

02

Chance constraints can be approximated by ECSCs, enabling practical implementation.

03

The framework improves safe RL algorithm performance on Safety Gym benchmarks.

Abstract

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state. We establish a Bellman equation for CCMDPs and further prove the existence of a flipping-based policy within the optimal solution sets. Since solving the problem with joint chance constraints is challenging in practice, we then prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Flipping-based Policy for Chance-Constrained Markov Decision Processes· slideslive

Taxonomy

TopicsSimulation Techniques and Applications · Complex Systems and Decision Making · Bayesian Modeling and Causal Inference

MethodsFLIP