Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Chenyu Zhao; Shenglin Zhang; Yihang Lin; Wenwei Gu; Zhimin Chen; Yongqian Sun; Dan Pei; Chetan Bansal; Saravan Rajmohan; Minghua Ma

arXiv:2605.08717·cs.SE·May 12, 2026

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Chenyu Zhao, Shenglin Zhang, Yihang Lin, Wenwei Gu, Zhimin Chen, Yongqian Sun, Dan Pei, Chetan Bansal, Saravan Rajmohan, Minghua Ma

PDF

TL;DR

PROBE is a structured recovery framework for software engineering agents that uses failure-anchored telemetry and diagnosis to improve post-failure recovery accuracy and effectiveness.

Contribution

It introduces a novel framework that organizes runtime evidence into grounded diagnoses and actionable guidance, outperforming existing baselines.

Findings

01

PROBE achieves 65.37% Top-1 diagnosis accuracy.

02

PROBE attains a 21.79% recovery rate, surpassing baselines.

03

A Microsoft IcM prototype demonstrates non-intrusive deployment.

Abstract

Software engineering agents are increasingly deployed in evaluable engineering environments, yet post-failure recovery remains costly, manual, and ad hoc. Existing systems expose traces or generate follow-up feedback, but they do not convert heterogeneous runtime evidence into grounded, bounded recovery guidance for a subsequent attempt. We present PROBE, a failure-anchored framework for structured recovery in software engineering agents. PROBE organizes failed-run telemetry into structured evidence, structured diagnosis, and bounded recovery guidance through a Telemetry Layer, a Diagnosis Layer, and a Guidance Gate. The Telemetry Layer preserves fine-grained runtime signals, the Diagnosis Layer fuses cross-signal evidence into grounded diagnoses, and the Guidance Gate produces diagnosis-derived guidance only when it is evidence-grounded, actionable, and within the scope of agent-side…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.