RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale
Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan

TL;DR
RepoForge introduces an end-to-end data curation pipeline that significantly improves training efficiency and performance of small-scale SWE LLMs, achieving state-of-the-art results with reduced costs and resources.
Contribution
The paper presents RepoForge, a scalable, automated pipeline combining data generation, evaluation, and training for SWE LLMs, enabling state-of-the-art performance at a fraction of traditional costs.
Findings
Achieved 17.4% on SWE-Bench-Verified, setting new SOTA for ≤8B models.
Generated 7,304 executable environments with zero manual effort.
Reduced storage by 14× and evaluation time by over 70%.
Abstract
Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\% on SWE-Bench-Verified~\citep{swebench_verified2024}, establishing new state-of-the-art for 8B non-thinking LLMs; (2) 7,304 executable environments auto-generated from real GitHub commits with zero manual intervention; (3) 14 storage reduction (1.4GB 102MB per instance) via intelligent dependency management and image pruning; (4) 70\% faster evaluation using a Ray-powered~\citep{ray2018} distributed RepoForge harness; (5) 19,000 cheaper labeling through our automated SPICE~\citep{spice2024}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Testing and Debugging Techniques
