Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou

TL;DR
This paper introduces Skywork-SWE, a large-scale, automated dataset for software engineering tasks in LLMs, demonstrating continuous performance improvements with increased data size and setting new state-of-the-art results.
Contribution
The paper presents an automated pipeline for scaling SWE datasets, resulting in over 10,000 instances, and reveals a data scaling law for LLMs in software engineering capabilities.
Findings
Model performance improves with more data, showing no saturation.
Skywork-SWE achieves 38.0% pass@1 accuracy on SWE-bench.
Performance further improves to 47.0% with test-time scaling techniques.
Abstract
Software engineering (SWE) has recently emerged as a crucial testbed for next-generation LLM agents, demanding inherent capabilities in two critical dimensions: sustained iterative problem-solving (e.g., >50 interaction rounds) and long-context dependency resolution (e.g., >32k tokens). However, the data curation process in SWE remains notoriously time-consuming, as it heavily relies on manual annotation for code file filtering and the setup of dedicated runtime environments to execute and validate unit tests. Consequently, most existing datasets are limited to only a few thousand GitHub-sourced instances. To this end, we propose an incremental, automated data-curation pipeline that systematically scales both the volume and diversity of SWE datasets. Our dataset comprises 10,169 real-world Python task instances from 2,531 distinct GitHub repositories, each accompanied by a task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Big Data and Business Intelligence · Data Quality and Management
