Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents
Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Aishan Liu, Xianglong Liu, Chao Shen, and Bin Shi

TL;DR
This survey analyzes 178 benchmarks for Code Large Language Models and agents across the entire Software Development Life Cycle, highlighting coverage gaps, risks of data leakage, and future research directions.
Contribution
It provides a comprehensive tiered review framework for SDLC-based benchmarks, revealing coverage imbalances and identifying key challenges and future research directions.
Findings
61% of benchmarks focus on implementation phase
Minimal benchmarks for requirements engineering and design
Lack of anti-contamination strategies increases data leakage risk
Abstract
Code large language models (CodeLLMs) and agents are increasingly being integrated into complex software engineering tasks spanning the entire Software Development Life Cycle (SDLC). Benchmarking is critical for rigorously evaluating these capabilities. However, despite their growing significance, there remains a lack of comprehensive reviews that examine these benchmarks from an SDLC perspective. To bridge this gap, we propose a tiered analysis framework to systematically review 178 benchmarks from 461 papers, comprehensively characterizing them from the perspective of the SDLC. Our findings reveal a notable imbalance in the coverage of current benchmarks, with approximately 61\% focused on the software implementation phase in SDLC, while requirements engineering and software design phases receive minimal attention at only 5\% and 3\%, respectively. % Additionally, anti-contamination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
MethodsSoftmax · Attention Is All You Need
