Repo2Run: Automated Building Executable Environment for Code Repository at Scale

Ruida Hu; Chao Peng; Xinchen Wang; Junjielong Xu; Cuiyun Gao

arXiv:2502.13681·cs.SE·October 21, 2025

Repo2Run: Automated Building Executable Environment for Code Repository at Scale

Ruida Hu, Chao Peng, Xinchen Wang, Junjielong Xu, Cuiyun Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

Repo2Run is an innovative LLM-based system that automates the creation of executable test environments for code repositories, significantly enhancing scalability and success rates in building Docker environments for diverse repositories.

Contribution

It introduces the first LLM-driven approach to automate building Docker environments for repositories at scale, reducing manual effort and increasing success rates.

Findings

01

Achieves 86.0% success rate in building environments

02

Outperforms previous methods by 77.0%

03

Successfully automates Dockerfile synthesis for Python repositories

Abstract

Scaling up executable code data is significant for improving language models' software engineering capability. The intricate nature of the process makes it labor-intensive, time-consuming and expert-knowledge-dependent to build a large number of executable code repositories, limiting the scalability of existing work based on running tests. The primary bottleneck lies in the automated building of test environments for different repositories, which is an essential yet underexplored task. To mitigate the gap, we introduce Repo2Run, the first LLM-based agent aiming at automating the building of executable test environments for any repositories at scale. Specifically, given a code repository, Repo2Run iteratively builds the Docker image, runs unit tests based on the feedback of the building, and synthesizes the Dockerfile until the entire pipeline is executed successfully. The resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bytedance/repo2run
noneOfficial

Videos

Repo2Run: Automated Building Executable Environment for Code Repository at Scale· slideslive

Taxonomy

TopicsDistributed and Parallel Computing Systems · Software System Performance and Reliability · Distributed systems and fault tolerance