Improving the Effectiveness of Potential-Based Reward Shaping in   Reinforcement Learning

Henrik M\"uller; Daniel Kudenko

arXiv:2502.01307·cs.LG·February 4, 2025

Improving the Effectiveness of Potential-Based Reward Shaping in Reinforcement Learning

Henrik M\"uller, Daniel Kudenko

PDF

Open Access

TL;DR

This paper enhances potential-based reward shaping in reinforcement learning by introducing a linear shift method that improves exploration efficiency without altering policy preferences, supported by theoretical analysis and empirical validation.

Contribution

It introduces a simple linear shift of the potential function to improve reward shaping effectiveness without changing encoded preferences or adjusting initial Q-values.

Findings

01

Linear shift improves reward shaping effectiveness

02

Theoretical limitations of continuous potential functions identified

03

Empirical validation on Gridworld, Cart Pole, Mountain Car environments

Abstract

Potential-based reward shaping is commonly used to incorporate prior knowledge of how to solve the task into reinforcement learning because it can formally guarantee policy invariance. As such, the optimal policy and the ordering of policies by their returns are not altered by potential-based reward shaping. In this work, we highlight the dependence of effective potential-based reward shaping on the initial Q-values and external rewards, which determine the agent's ability to exploit the shaping rewards to guide its exploration and achieve increased sample efficiency. We formally derive how a simple linear shift of the potential function can be used to improve the effectiveness of reward shaping without changing the encoded preferences in the potential function, and without having to adjust the initial Q-values, which can be challenging and undesirable in deep reinforcement learning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation Diffusion and Forecasting