From What to How: Bridging User Requirements with Software Development Using Large Language Models
Xiao He, Ru Chen, Jialun Cao

TL;DR
This paper introduces DesBench, a benchmark for evaluating large language models on software design tasks, revealing current limitations in handling design intricacies despite progress in code generation.
Contribution
The paper presents DesBench, a novel design-aware benchmark for assessing LLMs on software design, object-oriented modeling, and acceptance test generation, filling a gap in existing evaluation methods.
Findings
LLMs struggle with design-aware code generation from high-level requirements.
LLMs can identify objects and classes but have difficulty defining operations and relationships.
Generated acceptance tests achieve human-like code coverage quality.
Abstract
Recently, large language models (LLMs) are extensively utilized to enhance development efficiency, leading to numerous benchmarks for evaluating their performance. However, these benchmarks predominantly focus on implementation, overlooking the equally critical aspect of software design. This gap raises two pivotal questions: (1) Can LLMs handle software design? (2) Can LLMs write code following the specific designs? To investigate these questions, this paper proposes DesBench, a design-aware benchmark for evaluating LLMs on three software design-related tasks: design-aware code generation, object-oriented modeling, and the design of acceptance test cases. DesBench comprises 30 manually crafted Java projects that include requirement documents, design models, implementations, and acceptance tests, amounting to a total of 30 design models, 194 Java classes, and 737 test cases. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Topic Modeling
