SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
Lianghong Guo, Yanlin Wang, Caihua Li, Wei Tao, Pengyu Yang, Jiachi Chen, Haoyu Song, Duyu Tang, and Zibin Zheng

TL;DR
SWE-Factory is an automated pipeline that constructs large-scale GitHub issue resolution datasets, addressing previous manual limitations and enabling efficient evaluation of LLMs in software engineering tasks.
Contribution
It introduces SWE-Factory, a fully automated data construction pipeline with novel components for binary file recovery, LLM-based environment setup, and log parsing for validation.
Findings
Successfully constructed evaluation datasets for 671 issues across four languages.
Achieved high validation accuracy with an F1 score of 0.99.
Demonstrated cost-effective environment construction at $0.047 per instance.
Abstract
Constructing large-scale datasets for the GitHub issue resolution task is crucial for both training and evaluating the software engineering capabilities of Large Language Models (LLMs). However, the existing GitHub issue resolution data construction pipeline is challenging and labor-intensive. We identify three key limitations in existing pipelines: (1) test patches collected often omit binary file changes; (2) the manual construction of evaluation environments is labor-intensive; and (3) the fail2pass validation phase requires manual inspection of test logs and writing custom parsing code to extract test status from logs. In this paper, we propose SWE-Factory, a fully automated issue resolution data construction pipeline, to resolve these limitations. First, our pipeline automatically recovers missing binary test files and ensures the correctness of test patches. Second, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · GPT-4
