Adaptable Hindsight Experience Replay for Search-Based Learning

Alexandros Vazaios; Jannis Brugger; Cedric Derstroff; Kristian Kersting; Mira Mezini

arXiv:2511.03405·cs.LG·November 6, 2025

Adaptable Hindsight Experience Replay for Search-Based Learning

Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, Kristian Kersting, Mira Mezini

PDF

Open Access

TL;DR

This paper introduces Adaptable HER, a flexible framework combining Hindsight Experience Replay with AlphaZero-like search algorithms, improving learning efficiency in sparse reward problems and outperforming traditional methods.

Contribution

The paper presents a novel adaptable HER framework that integrates with AlphaZero, enabling customizable relabeling strategies to enhance search-based learning.

Findings

01

Modified HER improves learning in sparse reward settings

02

Adaptable HER surpasses pure supervised and reinforcement learning

03

Framework allows flexible adjustment of HER properties

Abstract

AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research