Constrained Policy Optimization
Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

TL;DR
This paper introduces Constrained Policy Optimization (CPO), a novel reinforcement learning algorithm that ensures safety constraints are satisfied during training, enabling high-dimensional control with safety guarantees.
Contribution
CPO is the first policy search method providing theoretical guarantees for constraint satisfaction throughout training in reinforcement learning.
Findings
Successfully trained neural policies for robot locomotion with safety constraints.
Provided theoretical bounds relating policy performance to divergence measures.
Demonstrated effectiveness on simulated safety-critical control tasks.
Abstract
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in high-dimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
