On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning
Giuseppe Canonaco, Leo Ardon, Alberto Pozanco, Daniel Borrajo

TL;DR
This paper investigates how potential-based reward shaping and abstractions can improve sample efficiency in reinforcement learning, analyzing biases and proposing methods to approximate optimal value functions.
Contribution
It provides theoretical insights on selecting potential functions and analyzes the bias from finite horizons, demonstrating effective use of abstractions for improved sample efficiency.
Findings
Selecting the optimal value function as the potential yields performance gains.
Finite horizon biases affect reward shaping effectiveness.
Abstractions enable comparable performance to CNNs with simpler networks.
Abstract
The use of Potential-Based Reward Shaping (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, choosing the right potential function remains an open challenge. Additionally, RL techniques are usually constrained to use a finite horizon for computational limitations, which introduces a bias when using PBRS. In this paper, we first build some theoretically-grounded intuition on why selecting the potential function as the optimal value function of the task at hand produces performance advantages. We then analyse the bias induced by finite horizons in the context of PBRS producing novel insights. Finally, leveraging abstractions as a way to approximate the optimal value function of the given task, we assess the sample efficiency and performance impact of PBRS on four environments including a goal-oriented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
