Embedding Safety into RL: A New Take on Trust Region Methods

Nikola Milosevic; Johannes M\"uller; Nico Scherf

arXiv:2411.02957·cs.LG·August 18, 2025

Embedding Safety into RL: A New Take on Trust Region Methods

Nikola Milosevic, Johannes M\"uller, Nico Scherf

PDF

Open Access

TL;DR

This paper introduces C-TRPO, a novel reinforcement learning method that guarantees safety constraints are satisfied during training by shaping the policy space, improving safety without sacrificing performance.

Contribution

C-TRPO is a new trust region method that ensures safety constraints are maintained throughout training, with theoretical analysis and empirical validation.

Findings

01

C-TRPO reduces constraint violations effectively.

02

C-TRPO maintains competitive reward performance.

03

Theoretical connections to TRPO, NPG, and CPO are established.

Abstract

Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Access Control and Trust · Security and Verification in Computing

MethodsTrust Region Policy Optimization