From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

Murong Ma; Tianyu Chen; Yun Lin; Shuai Lu; Qinglin Zhu; Yeyun Gong; Zhiyong Huang; Peng Cheng; Yan Lu; and Jin Song Dong

arXiv:2605.21996·cs.SE·May 22, 2026

From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

Murong Ma, Tianyu Chen, Yun Lin, Shuai Lu, Qinglin Zhu, Yeyun Gong, Zhiyong Huang, Peng Cheng, Yan Lu, and Jin Song Dong

PDF

TL;DR

This paper introduces P2T, a novel supervised fine-tuning method that leverages privileged information from reference patches to improve software-engineering agents' effectiveness and efficiency.

Contribution

P2T formulates trajectory construction as bi-objective optimization using privileged reference patches, enhancing training data quality for SWE agents.

Findings

01

P2T improves Pass@1 by up to 10.8 points on SWE-bench Verified.

02

Reduces per-instance inference cost by approximately 15%.

03

Achieves better effectiveness and efficiency with only 1.8k curated instances.

Abstract

Supervised fine-tuning (SFT) on long teacher trajectories is the dominant way to instill investigation and reasoning in open software-engineering (SWE) agents. Since every retained response becomes an imitation target, the student inherits the final outcome and intermediate flaws, including ungrounded leaps and redundant loops. High-quality training data must be effective(each step is grounded and narrows the agent's epistemic gap to the correct fix) and efficient(each step is information-bearing rather than redundant or looping). Existing recipes filter or relabel teacher rollouts using only a binary terminal verifier, which does not directly target these axes and provides no supervision on instances where the teacher fails. Most real issue includes a developer-authored reference patch, $p^{⋆}$ , revealing the file paths, runtime behaviors, and coding conventions presupposed by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.