Action-Dependent Optimality-Preserving Reward Shaping

Grant C. Forbes; Jianxun Wang; Leonardo Villalobos-Arias; Arnav Jhala; David L. Roberts

arXiv:2505.12611·cs.LG·May 20, 2025

Action-Dependent Optimality-Preserving Reward Shaping

Grant C. Forbes, Jianxun Wang, Leonardo Villalobos-Arias, Arnav Jhala, David L. Roberts

PDF

Open Access 1 Video

TL;DR

This paper introduces ADOPS, a novel reward shaping method that preserves optimality in complex, exploration-heavy environments, enabling more effective use of intrinsic motivation in reinforcement learning.

Contribution

ADOPS extends reward shaping by allowing action-dependent intrinsic rewards that preserve optimal policies, overcoming limitations of potential-based methods in sparse, exploration-intensive environments.

Findings

01

ADOPS effectively preserves optimality with action-dependent intrinsic rewards.

02

ADOPS outperforms existing methods in Montezuma's Revenge.

03

ADOPS enables better exploration in sparse-reward environments.

Abstract

Recent RL research has utilized reward shaping--particularly complex shaping rewards such as intrinsic motivation (IM)--to encourage agent exploration in sparse-reward environments. While often effective, ``reward hacking'' can lead to the shaping reward being optimized at the expense of the extrinsic reward, resulting in a suboptimal policy. Potential-Based Reward Shaping (PBRS) techniques such as Generalized Reward Matching (GRM) and Policy-Invariant Explicit Shaping (PIES) have mitigated this. These methods allow for implementing IM without altering optimal policies. In this work we show that they are effectively unsuitable for complex, exploration-heavy environments with long-duration episodes. To remedy this, we introduce Action-Dependent Optimality Preserving Shaping (ADOPS), a method of converting intrinsic rewards to an optimality-preserving form that allows agents to utilize IM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Action-Dependent Optimality-Preserving Reward Shaping· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning