Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Chengyang Ying; Xinning Zhou; Hang Su; Dong Yan; Ning Chen; Jun Zhu

arXiv:2206.04436·cs.LG·May 20, 2025·6 cites

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Chengyang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a risk-sensitive reinforcement learning method called CPPO that constrains the Conditional Value-at-Risk (CVaR) to improve safety and robustness against uncertainties in transition and observation, validated on MuJoCo tasks.

Contribution

It provides a theoretical analysis linking performance degradation to a new metric VFR and proposes a novel CVaR-based constrained optimization algorithm for safer RL.

Findings

01

CPPO achieves higher cumulative rewards.

02

CPPO demonstrates robustness against disturbances.

03

Theoretical link between VFR and performance degradation.

Abstract

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yingchengyang/CPPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Systems Engineering in Autonomy