SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
Yifeng Ding, Lingming Zhang

TL;DR
SWE-Replay is a novel, efficient test-time scaling method for software engineering agents that recycles prior trajectories to reduce costs and improve performance without relying on noisy value estimates.
Contribution
It introduces SWE-Replay, the first generalizable and cost-effective test-time scaling technique that dynamically balances exploration and exploitation by reusing trajectories based on their significance.
Findings
Reduces scaling costs by up to 17.4%
Maintains or improves performance by up to 3.8%
Demonstrates effectiveness across multiple SWE benchmarks
Abstract
Test-time scaling has been widely adopted to enhance the capabilities of Large Language Model (LLM) agents in software engineering (SWE) tasks. However, the standard approach of repeatedly sampling trajectories from scratch is computationally expensive. While recent methods have attempted to mitigate costs using specialized value agents, they can suffer from model miscalibration and fail to generalize to modern agents that synthesize custom bash scripts as tools. In this paper, we introduce SWE-Replay, the first efficient and generalizable test-time scaling technique for modern agents without reliance on potentially noisy value estimates. SWE-Replay optimizes the scaling process by recycling trajectories from prior trials, dynamically choosing to either explore from scratch or exploit archived experience by branching at critical intermediate steps. This selection of intermediate steps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Mobile Crowdsensing and Crowdsourcing
