From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

Yuxuan Wan; Tingshuo Liang; Jiakai Xu; Jingyu Xiao; Yintong Huo; Michael R Lyu

arXiv:2605.17242·cs.SE·May 19, 2026

From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

Yuxuan Wan, Tingshuo Liang, Jiakai Xu, Jingyu Xiao, Yintong Huo, Michael R Lyu

PDF

1 Repo

TL;DR

This paper introduces TDDev, a framework that automates the full cycle of web application development from requirements to deployment, enabling empirical evaluation of test-driven development strategies for AI-generated applications.

Contribution

The paper presents TDDev, a novel automated framework for TDD in web app generation, and provides the first empirical study comparing different TDD protocols for coding agents.

Findings

01

TDD improves web app generation quality by 34-48 percentage points.

02

Model generation style influences the effectiveness of TDD protocols.

03

Mismatched TDD protocols can negate benefits and increase token costs up to 25-fold.

Abstract

Coding agents can generate web applications from natural-language descriptions, yet a recent benchmark study shows that generated applications fail to meet functional requirements in over 70% of cases. The core difficulty is that web correctness cannot be assessed from source files or terminal output: the application must be deployed, exercised through simulated browser interactions, and failures must be translated into actionable repair signals -- steps that current agents cannot perform without human mediation. We present TDDev, a framework that automates this closed loop through three stages: (1) converting high-level requirements into structured acceptance tests before any code is written, (2) deploying the application and validating it through browser-based interaction simulation, and (3) translating browser-observed failures into structured repair reports for the coding agent.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yxwan123/TDDev
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.