Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios
Zhi Chen, Lingxiao Jiang

TL;DR
This paper evaluates the performance of AI-based software development agents in generating patches for real-world GitHub issues, revealing strengths, limitations, and areas for improvement in code quality and complexity management.
Contribution
It provides the first comprehensive real-world evaluation of agent-generated patches, analyzing their impact on code quality, complexity, and issue resolution in practical software engineering scenarios.
Findings
No single agent dominated patch quality.
Agents maintained code reliability and security.
Performance was better on simpler codebases.
Abstract
In recent years, AI-based software engineering has progressed from pre-trained models to advanced agentic workflows, with Software Development Agents representing the next major leap. These agents, capable of reasoning, planning, and interacting with external environments, offer promising solutions to complex software engineering tasks. However, while much research has evaluated code generated by large language models (LLMs), comprehensive studies on agent-generated patches, particularly in real-world settings, are lacking. This study addresses that gap by evaluating 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues from SWE-Bench Verified, focusing on their impact on code quality. Our analysis shows no single agent dominated, with 170 issues unresolved, indicating room for improvement. Even for patches that passed unit tests and resolved issues, agents made…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Software Engineering Methodologies
