Adversarial Constrained Policy Optimization: Improving Constrained   Reinforcement Learning by Adapting Budgets

Jianmina Ma; Jingtian Ji; Yue Gao

arXiv:2410.20786·cs.LG·October 29, 2024

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Jianmina Ma, Jingtian Ji, Yue Gao

PDF

Open Access

TL;DR

This paper introduces ACPO, an adversarial approach to constrained reinforcement learning that adaptively balances reward maximization and constraint satisfaction, improving performance in safety-critical tasks.

Contribution

The paper proposes a novel adversarial framework for constrained RL that dynamically adapts cost budgets and guarantees policy update performance.

Findings

01

ACPO outperforms baseline methods in Safety Gymnasium tasks.

02

The approach effectively balances reward and constraints during training.

03

Theoretical guarantees support the policy update performance.

Abstract

Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right balance between task performance and constraint satisfaction and it is prone for them to get stuck in over-conservative or constraint violating local minima. In this paper, we propose Adversarial Constrained Policy Optimization (ACPO), which enables simultaneous optimization of reward and the adaptation of cost budgets during training. Our approach divides original constrained problem into two adversarial stages that are solved alternately, and the policy update performance of our algorithm can be theoretically guaranteed. We validate our method through experiments conducted on Safety Gymnasium and quadruped locomotion tasks. Results demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security · Adversarial Robustness in Machine Learning