RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository

Zhiyuan Peng; Xin Yin; Pu Zhao; Fangkai Yang; Lu Wang; Ran Jia; Xu Chen; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang

arXiv:2601.13943·cs.SE·April 16, 2026

RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository

Zhiyuan Peng, Xin Yin, Pu Zhao, Fangkai Yang, Lu Wang, Ran Jia, Xu Chen, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

PDF

1 Repo

TL;DR

RepoGenesis is a comprehensive benchmark for evaluating end-to-end microservice code generation from Readme files, highlighting current system limitations and providing a platform for future improvements.

Contribution

It introduces the first multilingual, repository-level microservice generation benchmark with extensive data and evaluation metrics, aiding progress in real-world code synthesis.

Findings

01

Open-source agents achieve up to 73.91% API coverage but low Pass@1 scores.

02

Best systems have less than 24% Pass@1 accuracy, indicating room for improvement.

03

Fine-tuned GenesisAgent-8B performs comparably to GPT-5 mini, showing benchmark quality.

Abstract

Large language models and agents have achieved remarkable progress in code generation. However, existing benchmarks focus on isolated function/class-level generation (e.g., ClassEval) or modifications to existing codebases (e.g., SWE-Bench), neglecting complete microservice repository generation that reflects real-world 0-to-1 development workflows. To bridge this gap, we introduce RepoGenesis, the first multilingual benchmark for repository-level end-to-end web microservice generation, comprising 106 repositories (60 Python, 46 Java) across 18 domains and 11 frameworks, with 1,258 API endpoints and 2,335 test cases verified through a "review-rebuttal" quality assurance process. We evaluate open-source agents (e.g., DeepCode) and commercial IDEs (e.g., Cursor) using Pass@1, API Coverage (AC), and Deployment Success Rate (DSR). Results reveal that despite high AC (up to 73.91%) and DSR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pzy2000/RepoGenesis
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.