Behavior Alignment via Reward Function Optimization
Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno, Castro da Silva

TL;DR
This paper introduces a bi-level optimization framework for learning reward functions that align agent behavior with designer intentions, effectively integrating heuristics and primary rewards to improve robustness and performance in reinforcement learning.
Contribution
It proposes a novel method that automatically blends auxiliary heuristics with primary rewards, addressing reward misspecification and enhancing policy robustness in RL.
Findings
Framework improves performance with heuristic reward functions
Robustness against reward misspecification demonstrated
Effective across diverse tasks and control challenges
Abstract
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training
