Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents
Chenyu Zhao, Shenglin Zhang, Yihang Lin, Wenwei Gu, Zhimin Chen, Yongqian Sun, Dan Pei, Chetan Bansal, Saravan Rajmohan, Minghua Ma

TL;DR
PROBE is a structured recovery framework for software engineering agents that uses failure-anchored telemetry and diagnosis to improve post-failure recovery accuracy and effectiveness.
Contribution
It introduces a novel framework that organizes runtime evidence into grounded diagnoses and actionable guidance, outperforming existing baselines.
Findings
PROBE achieves 65.37% Top-1 diagnosis accuracy.
PROBE attains a 21.79% recovery rate, surpassing baselines.
A Microsoft IcM prototype demonstrates non-intrusive deployment.
Abstract
Software engineering agents are increasingly deployed in evaluable engineering environments, yet post-failure recovery remains costly, manual, and ad hoc. Existing systems expose traces or generate follow-up feedback, but they do not convert heterogeneous runtime evidence into grounded, bounded recovery guidance for a subsequent attempt. We present PROBE, a failure-anchored framework for structured recovery in software engineering agents. PROBE organizes failed-run telemetry into structured evidence, structured diagnosis, and bounded recovery guidance through a Telemetry Layer, a Diagnosis Layer, and a Guidance Gate. The Telemetry Layer preserves fine-grained runtime signals, the Diagnosis Layer fuses cross-signal evidence into grounded diagnoses, and the Guidance Gate produces diagnosis-derived guidance only when it is evidence-grounded, actionable, and within the scope of agent-side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
