Bootstrapped Reward Shaping
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

TL;DR
This paper introduces BSRS, a bootstrapped reward shaping method that uses the agent's value estimate as a potential function, improving training efficiency in reinforcement learning, especially in sparse-reward environments.
Contribution
The paper proposes a novel bootstrapped reward shaping technique that automates potential function design using value estimates, with theoretical convergence and empirical benefits.
Findings
Improves training speed in Atari games
Provides convergence proofs for tabular cases
Enhances reward observation frequency
Abstract
In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required "potential function" must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a "bootstrapped" method of reward shaping, termed BSRS, in which the agent's current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
