Bootstrapped Reward Shaping

Jacob Adamczyk; Volodymyr Makarenko; Stas Tiomkin; Rahul V. Kulkarni

arXiv:2501.00989·cs.LG·July 28, 2025

Bootstrapped Reward Shaping

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

PDF

Open Access

TL;DR

This paper introduces BSRS, a bootstrapped reward shaping method that uses the agent's value estimate as a potential function, improving training efficiency in reinforcement learning, especially in sparse-reward environments.

Contribution

The paper proposes a novel bootstrapped reward shaping technique that automates potential function design using value estimates, with theoretical convergence and empirical benefits.

Findings

01

Improves training speed in Atari games

02

Provides convergence proofs for tabular cases

03

Enhances reward observation frequency

Abstract

In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required "potential function" must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a "bootstrapped" method of reward shaping, termed BSRS, in which the agent's current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings