From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents

Nikolai Ludwig; Wasi Uddin Ahmad; Somshubra Majumdar; Boris Ginsburg

arXiv:2604.01496·cs.SE·May 7, 2026

From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents

Nikolai Ludwig, Wasi Uddin Ahmad, Somshubra Majumdar, Boris Ginsburg

PDF

TL;DR

This paper presents SWE-ZERO and SWE-HERO, a two-stage fine-tuning approach for software engineering agents that achieves state-of-the-art results by combining execution-free learning with execution-backed refinement.

Contribution

The paper introduces a novel two-stage fine-tuning pipeline that distills open-weight LLMs for software engineering tasks, improving performance and generalizability.

Findings

01

SWE-HERO-32B achieves 62.2% resolution on SWE-bench Verified.

02

The agents demonstrate 44.1% zero-shot transferability on multilingual SWE-bench.

03

The pipeline reduces resource dependencies with an evolutionary refinement strategy.

Abstract

We introduce SWE-ZERO to SWE-HERO, a two-stage SFT recipe that achieves state-of-the-art results on SWE-bench by distilling open-weight frontier LLMs. Our pipeline replaces resource-heavy dependencies with an evolutionary refinement strategy: (1) SWE-ZERO utilizes large-scale, execution-free trajectories to master code semantics and repository-level reasoning, and (2) SWE-HERO applies targeted, execution-backed refinement to transition these semantic intuitions into rigorous engineering workflows. Our empirical results set a new benchmark for open-source models of comparable size. We release a dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories distilled from Qwen3-Coder-480B, alongside a suite of agents based on the Qwen2.5-Coder series. Notably, SWE-HERO-32B achieves a 62.2% resolution rate on SWE-bench Verified. Furthermore, despite being trained exclusively on Python, our agents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.