Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development
Zhengran Zeng, Yixin Li, Rui Xie, Wei Ye, Shikun Zhang

TL;DR
This paper introduces a realistic benchmark and evaluation framework for LLM-based autonomous agents in end-to-end software development, revealing key challenges and guiding future improvements.
Contribution
It presents a new challenging benchmark and a hybrid evaluation framework for assessing LLM-based agents, along with an empirical study on agent architecture impacts.
Findings
State-of-the-art agents fulfill about 50% of requirements.
Success depends heavily on task decomposition and collaboration strategies.
Main bottlenecks include requirement omission and poor self-verification.
Abstract
The development of LLM-based autonomous agents for end-to-end software development represents a significant paradigm shift in software engineering. However, the scientific evaluation of these systems is hampered by significant challenges, including overly simplistic benchmarks and the difficulty of conducting fair comparisons between different agent architectures due to confounding implementation variables. To address these limitations, we first construct a challenging and dynamically curated E2EDevBench to simulate realistic development scenarios. Second, we propose a hybrid evaluation framework that combines test-case-based functional assessment with fine-grained, LLM-based requirement verification. Using this framework, we conduct a controlled empirical study on three representative agent architectures implemented upon a unified foundation to isolate the impact of workflow design.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Multi-Agent Systems and Negotiation · Advanced Software Engineering Methodologies
