RepoZero: Can LLMs Generate a Code Repository from Scratch?

Zhaoxi Zhang; Yiming Xu; Jiahui Liang; Weikang Li; Xiaoshuai Chen; Liwei Qian; Xin Pei; Jizhou Huang; Run Sun; Yunfang Wu

arXiv:2605.07122·cs.SE·May 21, 2026

RepoZero: Can LLMs Generate a Code Repository from Scratch?

Zhaoxi Zhang, Yiming Xu, Jiahui Liang, Weikang Li, Xiaoshuai Chen, Liwei Qian, Xin Pei, Jizhou Huang, Run Sun, Yunfang Wu

PDF

TL;DR

RepoZero introduces a novel, automated benchmark for evaluating LLMs' ability to generate complete software repositories from scratch through execution-based verification.

Contribution

This work presents RepoZero, the first fully automated, execution-based benchmark for repository-level code generation, and proposes the ACE framework for iterative test-driven refinement.

Findings

01

State-of-the-art LLMs achieve only 30-55% pass rates on RepoZero.

02

RepoZero exposes significant gaps in current LLM capabilities for full repository synthesis.

03

The ACE framework improves code generation through iterative testing and error correction.

Abstract

Large Language Models (LLMs) have recently shown remarkable progress in code generation, yet their ability to construct complete software repositories from scratch remains poorly understood. A fundamental bottleneck is the lack of verifiable and scalable evaluation: existing benchmarks either focus on patch-based editing or rely on human or LLM-based judgments, which introduce bias and limit reproducibility. In this work, we present RepoZero, the first benchmark that enables fully automated, execution-based verification of repository-level generation from scratch. Our key idea is to reformulate generation as repository reproduction: given only API specifications, an agent must re-implement an entire repository such that its behavior matches the original implementation. This design allows for strict black-box validation via output equivalence, while naturally supporting large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.