TL;DR
This paper introduces TDDev, a framework that automates the full cycle of web application development from requirements to deployment, enabling empirical evaluation of test-driven development strategies for AI-generated applications.
Contribution
The paper presents TDDev, a novel automated framework for TDD in web app generation, and provides the first empirical study comparing different TDD protocols for coding agents.
Findings
TDD improves web app generation quality by 34-48 percentage points.
Model generation style influences the effectiveness of TDD protocols.
Mismatched TDD protocols can negate benefits and increase token costs up to 25-fold.
Abstract
Coding agents can generate web applications from natural-language descriptions, yet a recent benchmark study shows that generated applications fail to meet functional requirements in over 70% of cases. The core difficulty is that web correctness cannot be assessed from source files or terminal output: the application must be deployed, exercised through simulated browser interactions, and failures must be translated into actionable repair signals -- steps that current agents cannot perform without human mediation. We present TDDev, a framework that automates this closed loop through three stages: (1) converting high-level requirements into structured acceptance tests before any code is written, (2) deploying the application and validating it through browser-based interaction simulation, and (3) translating browser-observed failures into structured repair reports for the coding agent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
