Behavior Alignment via Reward Function Optimization

Dhawal Gupta; Yash Chandak; Scott M. Jordan; Philip S. Thomas; Bruno; Castro da Silva

arXiv:2310.19007·cs.LG·November 1, 2023·2 cites

Behavior Alignment via Reward Function Optimization

Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno, Castro da Silva

PDF

Open Access 1 Video

TL;DR

This paper introduces a bi-level optimization framework for learning reward functions that align agent behavior with designer intentions, effectively integrating heuristics and primary rewards to improve robustness and performance in reinforcement learning.

Contribution

It proposes a novel method that automatically blends auxiliary heuristics with primary rewards, addressing reward misspecification and enhancing policy robustness in RL.

Findings

01

Framework improves performance with heuristic reward functions

02

Robustness against reward misspecification demonstrated

03

Effective across diverse tasks and control challenges

Abstract

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Behavior Alignment via Reward Function Optimization· slideslive

Taxonomy

TopicsSoftware Engineering Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training