Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic, Johannes M\"uller, Nico Scherf

TL;DR
This paper introduces C-TRPO, a novel reinforcement learning method that guarantees safety constraints are satisfied during training by shaping the policy space, improving safety without sacrificing performance.
Contribution
C-TRPO is a new trust region method that ensures safety constraints are maintained throughout training, with theoretical analysis and empirical validation.
Findings
C-TRPO reduces constraint violations effectively.
C-TRPO maintains competitive reward performance.
Theoretical connections to TRPO, NPG, and CPO are established.
Abstract
Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Access Control and Trust · Security and Verification in Computing
MethodsTrust Region Policy Optimization
