First Order Constrained Optimization in Policy Space
Yiming Zhang, Quan Vuong, Keith W. Ross

TL;DR
This paper introduces FOCOPS, a simple first-order method for reinforcement learning that maximizes reward while satisfying safety constraints, demonstrated to outperform existing methods on robotics tasks.
Contribution
The paper proposes FOCOPS, a novel first-order constrained optimization method in policy space that effectively balances reward maximization and constraint satisfaction.
Findings
Achieves better performance on robotics locomotive tasks.
Provides an approximate upper bound on worst-case constraint violation.
Simple to implement due to first-order approach.
Abstract
In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior-such as ones which are deemed unsafe and to be avoided-are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach has an approximate upper bound for worst-case constraint violation throughout training and is first-order in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
MethodsTrust Region Policy Optimization · Entropy Regularization · Proximal Policy Optimization
