SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon; Archana Pradeep; Ang Chen; Alexander P. Ellis; Warren Hunt; Zijian Wang; John Yang; Samuel Thompson

arXiv:2602.22124·cs.SE·February 26, 2026

SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon, Archana Pradeep, Ang Chen, Alexander P. Ellis, Warren Hunt, Zijian Wang, John Yang, Samuel Thompson

PDF

Open Access

TL;DR

SWE-Protégé enhances small language models' software engineering capabilities by enabling selective expert collaboration, significantly improving performance on long-horizon tasks with minimal expert guidance.

Contribution

Introduces a post-training framework that trains small language models to selectively seek expert guidance, reducing looping and improving task success in software engineering.

Findings

01

Achieves 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement.

02

Uses expert assistance sparingly (~4 calls per task).

03

Combines supervised fine-tuning with reinforcement learning to improve decision-making.

Abstract

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Prot\'eg\'e, a post-training framework that reframes software repair as an expert-prot\'eg\'e collaboration problem. In SWE-Prot\'eg\'e, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Testing and Debugging Techniques