Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study
Yunhao Liang, Ruixuan Ying, Shiwen Ni, Zhe Cui

TL;DR
This study extends test-driven code generation from functions to classes, demonstrating that an iterative TDD framework significantly improves class-level correctness and reliability across multiple large language models.
Contribution
The paper introduces a scalable TDD framework for class-level code generation, including a new evaluation dataset and empirical analysis across eight LLMs.
Findings
Class-level correctness improved by 12 to 26 points
Up to 71% of classes are fully correct after TDD
Requires only a small number of repairs on average
Abstract
Test-driven development (TDD) has been adopted to improve Large Language Model (LLM)-based code generation by using tests as executable specifications. However, existing TDD-style code generation studies are largely limited to function-level tasks, leaving class-level synthesis where multiple methods interact through shared state and call dependencies underexplored. In this paper, we scale test-driven code generation from functions to classes via an iterative TDD framework. Our approach first analyzes intra-class method dependencies to derive a feasible generation schedule, and then incrementally implements each method under method-level public tests with reflection-style execution feedback and bounded repair iterations. To support test-driven generation and rigorous class-level evaluation, we construct ClassEval-TDD, a cleaned and standardized variant of ClassEval with consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Model-Driven Software Engineering Techniques
