Action-Dependent Optimality-Preserving Reward Shaping
Grant C. Forbes, Jianxun Wang, Leonardo Villalobos-Arias, Arnav Jhala, David L. Roberts

TL;DR
This paper introduces ADOPS, a novel reward shaping method that preserves optimality in complex, exploration-heavy environments, enabling more effective use of intrinsic motivation in reinforcement learning.
Contribution
ADOPS extends reward shaping by allowing action-dependent intrinsic rewards that preserve optimal policies, overcoming limitations of potential-based methods in sparse, exploration-intensive environments.
Findings
ADOPS effectively preserves optimality with action-dependent intrinsic rewards.
ADOPS outperforms existing methods in Montezuma's Revenge.
ADOPS enables better exploration in sparse-reward environments.
Abstract
Recent RL research has utilized reward shaping--particularly complex shaping rewards such as intrinsic motivation (IM)--to encourage agent exploration in sparse-reward environments. While often effective, ``reward hacking'' can lead to the shaping reward being optimized at the expense of the extrinsic reward, resulting in a suboptimal policy. Potential-Based Reward Shaping (PBRS) techniques such as Generalized Reward Matching (GRM) and Policy-Invariant Explicit Shaping (PIES) have mitigated this. These methods allow for implementing IM without altering optimal policies. In this work we show that they are effectively unsuitable for complex, exploration-heavy environments with long-duration episodes. To remedy this, we introduce Action-Dependent Optimality Preserving Shaping (ADOPS), a method of converting intrinsic rewards to an optimality-preserving form that allows agents to utilize IM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning
