PAGENT: Learning to Patch Software Engineering Agents
Haoran Xue, Gias Uddin, Song Wang

TL;DR
This paper empirically analyzes the causes of failed patches generated by LLM-based code agents, introduces PAGENT to address type-related errors using static analysis and LLM inference, and demonstrates its effectiveness in fixing such patches.
Contribution
It provides a detailed taxonomy of failure reasons in LLM-generated patches and proposes PAGENT, a novel hybrid approach combining static analysis and LLM inference to improve patch correctness.
Findings
PAGENT fixed 29 out of 127 type-related failed patches.
Seven top LLM code agents produced 769 failed patches across 114 issues.
Failure reasons include incorrect variable type inference and other categories.
Abstract
LLM Agents produce patches automatically to resolve an issue. However, they can generate inaccurate patches. Little is known about the root causes behind those failed patches or how those could be fixed. This paper reports an empirical study of the failed patches generated by seven top LLM code agents. We collected 114 issues from the SWE-bench Lite dataset that remained unresolved across the agents. The seven agents produced a total of 769 failed patches for those issues, which we checked with a combination of GPT-4o and manual analysis. We present a taxonomy of the failure reasons across the patches. The taxonomy contains six categories, with several sub-categories under each category. For example, a frequently observed category is the inability of an LLM to correctly infer/produce the appropriate variable type in the produced patch. As a first step towards addressing such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
