Constrained Policy Optimization

Joshua Achiam; David Held; Aviv Tamar; Pieter Abbeel

arXiv:1705.10528·cs.LG·May 31, 2017·112 cites

Constrained Policy Optimization

Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

PDF

Open Access 5 Repos

TL;DR

This paper introduces Constrained Policy Optimization (CPO), a novel reinforcement learning algorithm that ensures safety constraints are satisfied during training, enabling high-dimensional control with safety guarantees.

Contribution

CPO is the first policy search method providing theoretical guarantees for constraint satisfaction throughout training in reinforcement learning.

Findings

01

Successfully trained neural policies for robot locomotion with safety constraints.

02

Provided theoretical bounds relating policy performance to divergence measures.

03

Demonstrated effectiveness on simulated safety-critical control tasks.

Abstract

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in high-dimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning