An Empirical Study on LLM-based Agents for Automated Bug Fixing
Xiangxin Meng, Zexiong Ma, Pengfei Gao, Chao Peng

TL;DR
This paper systematically evaluates six LLM-based bug fixing systems on a benchmark, analyzing their performance, fault localization accuracy, and bug reproduction capabilities to identify areas for improvement.
Contribution
It provides a comprehensive empirical analysis of top LLM-based bug fixing agents, highlighting performance variations and suggesting optimization directions.
Findings
Performance varies significantly among systems.
Fault localization accuracy differs at file and symbol levels.
Further optimization of LLMs and agent design is needed.
Abstract
Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code modification. However, systematic analysis of these agent systems remain limited, particularly regarding performance variations among top-performing ones. In this paper, we examine six repair systems on the SWE-bench Verified benchmark for automated bug fixing. We first assess each system's overall performance, noting the instances solvable by all or none of these systems, and explore the capabilities of different systems. We also compare fault localization accuracy at file and code symbol levels and evaluate bug reproduction capabilities. Through analysis, we concluded that further optimization is needed in both the LLM capability itself and the design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
