AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

Liang Ding

arXiv:2603.21357·cs.AI·May 12, 2026

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

Liang Ding

PDF

1 Repo

TL;DR

AgentHER leverages hindsight experience replay to convert failed LLM agent trajectories into valuable training data, significantly improving success rates and sample efficiency across multiple benchmarks.

Contribution

This work adapts HER to natural-language trajectories, introducing a four-stage pipeline that relabels failures for enhanced training of LLM agents.

Findings

01

AgentHER improves success rates by 7.6-11.4% over success-only SFT.

02

Achieves 2x sample efficiency on WebArena and ToolBench.

03

Reduces label noise from 5.9% to 2.9% with robustness mechanisms.

Abstract

LLM-agent training pipelines routinely discard failed trajectories even though GPT-4o achieves only 14-20% on WebArena and below 55% pass@1 on ToolBench; even specialised systems at 50-65% leave the majority of trajectories unused. We introduce AgentHER, which recovers this lost signal by adapting Hindsight Experience Replay (HER) to natural-language agent trajectories: a trajectory that fails goal A is often a correct demonstration for an achievable alternative goal B. AgentHER realises this through a four-stage pipeline (failure classification, outcome extraction, LLM-guided relabeling with confidence gating, and data packaging) that converts discarded failures into SFT, DPO, and ShareGPT training data. On WebArena and ToolBench under a strict task-disjoint held-out protocol, AgentHER improves over success-only SFT by +7.6-11.4% across four model families (GPT-4o, Qwen2.5-72B/7B,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alphadl/AgentHER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.