Adaptable Hindsight Experience Replay for Search-Based Learning
Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, Kristian Kersting, Mira Mezini

TL;DR
This paper introduces Adaptable HER, a flexible framework combining Hindsight Experience Replay with AlphaZero-like search algorithms, improving learning efficiency in sparse reward problems and outperforming traditional methods.
Contribution
The paper presents a novel adaptable HER framework that integrates with AlphaZero, enabling customizable relabeling strategies to enhance search-based learning.
Findings
Modified HER improves learning in sparse reward settings
Adaptable HER surpasses pure supervised and reinforcement learning
Framework allows flexible adjustment of HER properties
Abstract
AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
