A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks
Zhuowen Yin, Cuifeng Gao, Chunsong Fan, Wenzhang Yang, Yinxing Xue, Lijun Zhang

TL;DR
This paper provides a comprehensive empirical evaluation of seven agent frameworks across key software engineering tasks, revealing their strengths, trade-offs, and guiding future improvements in agent-based tools.
Contribution
It offers the first broad comparison of multiple agent frameworks on real-world code-centric tasks using standard benchmarks.
Findings
Agent effectiveness varies with moderate overall success.
AgentOrchestra shows high coordination overhead and longer trajectories.
GPTswarm is the most cost-efficient framework.
Abstract
Unlike traditional automation tools or static LLM-based systems, agents combine decision-making and tool utilization to accomplish complex tasks, showing great potential in software engineering. However, existing studies largely focus on specific tasks or isolated aspects, providing an incomplete picture of agents' practical capabilities. To address this, we conduct a comprehensive empirical study evaluating seven general-purpose agent frameworks across three representative code-centric tasks: software development, vulnerability detection, and program repair. Each task is assessed using standard, widely adopted benchmarks to ensure objective and comparable evaluation. Agent performance is systematically analyzed from three complementary perspectives: effectiveness (task success), efficiency (execution process), and overhead (token consumption). Our findings reveal distinct capability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Testing and Debugging Techniques
