Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Liang Zeng; Yongcong Li; Yuzhen Xiao; Changshi Li; Chris Yuhao Liu; Rui Yan; Tianwen Wei; Jujie He; Xuchen Song; Yang Liu; Yahui Zhou

arXiv:2506.19290·cs.AI·June 25, 2025

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou

PDF

Open Access 1 Models

TL;DR

This paper introduces Skywork-SWE, a large-scale, automated dataset for software engineering tasks in LLMs, demonstrating continuous performance improvements with increased data size and setting new state-of-the-art results.

Contribution

The paper presents an automated pipeline for scaling SWE datasets, resulting in over 10,000 instances, and reveals a data scaling law for LLMs in software engineering capabilities.

Findings

01

Model performance improves with more data, showing no saturation.

02

Skywork-SWE achieves 38.0% pass@1 accuracy on SWE-bench.

03

Performance further improves to 47.0% with test-time scaling techniques.

Abstract

Software engineering (SWE) has recently emerged as a crucial testbed for next-generation LLM agents, demanding inherent capabilities in two critical dimensions: sustained iterative problem-solving (e.g., >50 interaction rounds) and long-context dependency resolution (e.g., >32k tokens). However, the data curation process in SWE remains notoriously time-consuming, as it heavily relies on manual annotation for code file filtering and the setup of dedicated runtime environments to execute and validate unit tests. Consequently, most existing datasets are limited to only a few thousand GitHub-sourced instances. To this end, we propose an incremental, automated data-curation pipeline that systematically scales both the volume and diversity of SWE datasets. Our dataset comprises 10,169 real-world Python task instances from 2,531 distinct GitHub repositories, each accompanied by a task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Skywork/Skywork-SWE-32B
model· 287 dl· ♡ 79
287 dl♡ 79

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Big Data and Business Intelligence · Data Quality and Management