Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
Karina Zainullina, Alexander Golubev, Maria Trofimova, Sergei Polezhaev, Ibragim Badertdinov, Daria Litvintseva, Simon Karasik, Filipp Fisin, Sergei Skvortsov, Maksim Nekrashevich, Anton Shevtsov, Boris Yangel

TL;DR
This paper introduces guided search strategies for non-serializable environments to improve large language model performance in agentic software engineering tasks, achieving state-of-the-art results.
Contribution
It proposes two novel search strategies guided by learned value functions tailored for non-serializable environments, enhancing success rates of LLMs in complex tasks.
Findings
Double the success rate on SWE-bench Verified with a fine-tuned model
Achieved 40.8% success rate, setting a new state-of-the-art
Transferable techniques improve performance of advanced closed models like GPT-4o
Abstract
Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks, such as mathematical reasoning and agentic software engineering. However, they often struggle to maintain consistent performance across multiple solution attempts. One effective approach to narrow the gap between average-case and best-case performance is guided test-time search, which explores multiple solution paths to identify the most promising one. Unfortunately, effective search techniques (e.g. MCTS) are often unsuitable for non-serializable RL environments, such as Docker containers, where intermediate environment states cannot be easily saved and restored. We investigate two complementary search strategies applicable to such environments: 1-step lookahead and trajectory selection, both guided by a learned action-value function estimator. On the SWE-bench Verified benchmark, a key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Data Mining Algorithms and Applications · Semantic Web and Ontologies
