Hindsight Hint Distillation: Scaffolded Reasoning for SWE Agents from CoT-free Answers

Shengjie Wang; Guanghe Li; Zonghan Yang; Yang Gao

arXiv:2605.11556·cs.AI·May 13, 2026

Hindsight Hint Distillation: Scaffolded Reasoning for SWE Agents from CoT-free Answers

Shengjie Wang, Guanghe Li, Zonghan Yang, Yang Gao

PDF

TL;DR

Hindsight Hint Distillation (HHD) enables training reasoning agents using only question-answer pairs, synthesizing guidance from model failures to improve long-horizon task performance without requiring explicit chain-of-thought annotations.

Contribution

HHD introduces a novel method to learn reasoning strategies from CoT-free data by using model-generated hindsight hints, reducing annotation costs and improving out-of-distribution generalization.

Findings

01

HHD outperforms RFT and trajectory-synthesis baselines by 8% on SWE-bench Verified.

02

HHD's reasoning strategies generalize well to out-of-distribution tasks.

03

HHD achieves significant improvements without explicit CoT annotations.

Abstract

Solving complex long-horizon tasks requires strong planning and reasoning capabilities. Although datasets with explicit chain-of-thought (CoT) rationales can substantially benefit learning, they are costly to obtain. To address this challenge, we propose Hindsight Hint Distillation (HHD), which only requires easy-to-obtain question-answer pairs without CoT annotations. Inspired by how human teachers use student mistakes to provide targeted guidance, HHD synthesizes hindsight hints from the model's own failed self-rollouts and uses them to scaffold on-policy rollouts that successfully complete the tasks. The model then self-distills these scaffolded trajectories and generalizes to new problems without hint guidance. Experiments show that HHD significantly outperforms iterative RFT and trajectory-synthesis baselines, achieving an absolute improvement of 8\% on SWE-bench Verified, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.