Toward Training Superintelligent Software Agents through Self-Play SWE-RL

Yuxiang Wei; Zhiqing Sun; Emily McMilin; Jonas Gehring; David Zhang; Gabriel Synnaeve; Daniel Fried; Lingming Zhang; Sida Wang

arXiv:2512.18552·cs.SE·May 20, 2026

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Lingming Zhang, Sida Wang

PDF

TL;DR

This paper introduces Self-play SWE-RL (SSR), a reinforcement learning approach that trains software agents to inject and repair bugs using minimal data, aiming toward superintelligent systems beyond human capabilities.

Contribution

The paper proposes a novel self-play reinforcement learning paradigm for training software agents with minimal data assumptions, enabling autonomous bug fixing and potential superintelligence.

Findings

01

SSR achieves +10.4 and +7.8 points on SWE-bench benchmarks.

02

SSR outperforms human-data baseline throughout training.

03

Agents can learn from real-world repositories without human-labeled data.

Abstract

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull requests) and environments (e.g., pass-to-pass and fail-to-pass tests) heavily depend on human knowledge or curation, posing a fundamental barrier to superintelligence. In this paper, we present Self-play SWE-RL (SSR), a first step toward training paradigms for superintelligent software agents. Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies, with no need for human-labeled issues or tests. Grounded in these real-world codebases, a single LLM agent is trained via reinforcement learning in a self-play setting to iteratively inject and repair software bugs of increasing complexity, with each bug formally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics