HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Woongyeng Yeo; Yumin Choi; Taekyung Ki; Sung Ju Hwang

arXiv:2605.17873·cs.LG·May 19, 2026

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang

PDF

TL;DR

HINT-SD is a targeted self-distillation method that improves long-horizon agent training by selectively applying feedback to failure-relevant actions, enhancing efficiency and effectiveness.

Contribution

The paper introduces HINT-SD, a novel framework that uses full-trajectory hindsight to target specific actions for feedback, addressing inefficiencies in previous methods.

Findings

01

HINT-SD outperforms dense feedback baselines by up to 18.80%.

02

It achieves 2.26× lower training step time.

03

Targeted feedback selection is crucial for long-horizon training.

Abstract

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level action-output signals, or by using feedback-conditioned self-distillation. However, generating feedback at every turn is inefficient when many intermediate turns are already successful or neutral, and applying feedback at a fixed or misaligned turn often fails to supervise the actions that contributed to the failure. To bridge this gap, we propose HINT-SD, a targeted self-distillation framework that uses full-trajectory hindsight to select failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. Experiments on BFCL v3 and AppWorld…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.