Reduced Policy Optimization for Continuous Control with Hard Constraints

Shutong Ding; Jingya Wang; Yali Du; Ye Shi

arXiv:2310.09574·cs.LG·December 22, 2023·1 cites

Reduced Policy Optimization for Continuous Control with Hard Constraints

Shutong Ding, Jingya Wang, Yali Du, Ye Shi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces RPO, a novel constrained reinforcement learning algorithm that integrates the generalized reduced gradient method to efficiently handle complex hard constraints in continuous control tasks, supported by new benchmark environments.

Contribution

The paper proposes RPO, the first RL algorithm to incorporate GRG for managing both equality and inequality hard constraints, along with new benchmarks for complex constrained environments.

Findings

01

RPO outperforms previous constrained RL algorithms in reward and constraint satisfaction.

02

Development of three new benchmarks with complex hard constraints.

03

RPO effectively handles non-convex and general hard constraints in continuous control.

Abstract

Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and outputs the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wadx2019/rpo
pytorchOfficial

Videos

Reduced Policy Optimization for Continuous Control with Hard Constraints· slideslive

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Smart Grid Energy Management