From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents
Murong Ma, Tianyu Chen, Yun Lin, Shuai Lu, Qinglin Zhu, Yeyun Gong, Zhiyong Huang, Peng Cheng, Yan Lu, and Jin Song Dong

TL;DR
This paper introduces P2T, a novel supervised fine-tuning method that leverages privileged information from reference patches to improve software-engineering agents' effectiveness and efficiency.
Contribution
P2T formulates trajectory construction as bi-objective optimization using privileged reference patches, enhancing training data quality for SWE agents.
Findings
P2T improves Pass@1 by up to 10.8 points on SWE-bench Verified.
Reduces per-instance inference cost by approximately 15%.
Achieves better effectiveness and efficiency with only 1.8k curated instances.
Abstract
Supervised fine-tuning (SFT) on long teacher trajectories is the dominant way to instill investigation and reasoning in open software-engineering (SWE) agents. Since every retained response becomes an imitation target, the student inherits the final outcome and intermediate flaws, including ungrounded leaps and redundant loops. High-quality training data must be effective(each step is grounded and narrows the agent's epistemic gap to the correct fix) and efficient(each step is information-bearing rather than redundant or looping). Existing recipes filter or relabel teacher rollouts using only a binary terminal verifier, which does not directly target these axes and provides no supervision on instances where the teacher fails. Most real issue includes a developer-authored reference patch, , revealing the file paths, runtime behaviors, and coding conventions presupposed by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
