Balancing Constraints and Rewards with Meta-Gradient D4PG

Dan A. Calian; Daniel J. Mankowitz; Tom Zahavy; Zhongwen Xu; and Junhyuk Oh; Nir Levine; Timothy Mann

arXiv:2010.06324·cs.LG·November 30, 2020·6 cites

Balancing Constraints and Rewards with Meta-Gradient D4PG

Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, and Junhyuk Oh, Nir Levine, Timothy Mann

PDF

Open Access 1 Video

TL;DR

This paper introduces a meta-gradient based soft-constrained reinforcement learning method that balances maximizing return with minimizing constraint violations, effectively handling complex constraints in real-world applications.

Contribution

It proposes a novel meta-gradient approach for soft-constrained RL that adapts to complex constraints without requiring precise threshold settings.

Findings

01

Outperforms baselines in four MuJoCo domains

02

Effectively balances return and constraint violations

03

Demonstrates robustness in real-world constraint scenarios

Abstract

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present a soft-constrained RL approach that utilizes meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of this approach by showing that it consistently outperforms the baselines across four different MuJoCo domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Balancing Constraints and Rewards with Meta-Gradient D4PG· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning