On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Giuseppe Canonaco; Leo Ardon; Alberto Pozanco; Daniel Borrajo

arXiv:2404.07826·cs.LG·August 12, 2025·1 cites

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Giuseppe Canonaco, Leo Ardon, Alberto Pozanco, Daniel Borrajo

PDF

Open Access

TL;DR

This paper investigates how potential-based reward shaping and abstractions can improve sample efficiency in reinforcement learning, analyzing biases and proposing methods to approximate optimal value functions.

Contribution

It provides theoretical insights on selecting potential functions and analyzes the bias from finite horizons, demonstrating effective use of abstractions for improved sample efficiency.

Findings

01

Selecting the optimal value function as the potential yields performance gains.

02

Finite horizon biases affect reward shaping effectiveness.

03

Abstractions enable comparable performance to CNNs with simpler networks.

Abstract

The use of Potential-Based Reward Shaping (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, choosing the right potential function remains an open challenge. Additionally, RL techniques are usually constrained to use a finite horizon for computational limitations, which introduces a bias when using PBRS. In this paper, we first build some theoretically-grounded intuition on why selecting the potential function as the optimal value function of the task at hand produces performance advantages. We then analyse the bias induced by finite horizons in the context of PBRS producing novel insights. Finally, leveraging abstractions as a way to approximate the optimal value function of the given task, we assess the sample efficiency and performance impact of PBRS on four environments including a goal-oriented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics