RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale

Zhilong Chen; Chengzong Zhao; Boyuan Chen; Dayi Lin; Yihao Chen; Arthur Leung; Gopi Krishnan Rajbahadur; Gustavo A. Oliva; Haoxiang Zhang; Aaditya Bhatia; Chong Chun Yong; Ahmed E. Hassan

arXiv:2508.01550·cs.SE·September 4, 2025

RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale

Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan

PDF

Open Access

TL;DR

RepoForge introduces an end-to-end data curation pipeline that significantly improves training efficiency and performance of small-scale SWE LLMs, achieving state-of-the-art results with reduced costs and resources.

Contribution

The paper presents RepoForge, a scalable, automated pipeline combining data generation, evaluation, and training for SWE LLMs, enabling state-of-the-art performance at a fraction of traditional costs.

Findings

01

Achieved 17.4% on SWE-Bench-Verified, setting new SOTA for ≤8B models.

02

Generated 7,304 executable environments with zero manual effort.

03

Reduced storage by 14× and evaluation time by over 70%.

Abstract

Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\% on SWE-Bench-Verified~\citep{swebench_verified2024}, establishing new state-of-the-art for $\leq$ 8B non-thinking LLMs; (2) 7,304 executable environments auto-generated from real GitHub commits with zero manual intervention; (3) 14 $\times$ storage reduction (1.4GB $\to$ 102MB per instance) via intelligent dependency management and image pruning; (4) $>$ 70\% faster evaluation using a Ray-powered~\citep{ray2018} distributed RepoForge harness; (5) 19,000 $\times$ cheaper labeling through our automated SPICE~\citep{spice2024}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Testing and Debugging Techniques