Scaling Test-time Compute for LLM Agents

King Zhu; Hanhao Li; Siwei Wu; Tianshun Xing; Dehua Ma; Xiangru Tang; Minghao Liu; Jian Yang; Jiaheng Liu; Yuchen Eleanor Jiang; Changwang Zhang; Chenghua Lin; Jun Wang; Ge Zhang; Wangchunshu Zhou

arXiv:2506.12928·cs.AI·June 17, 2025

Scaling Test-time Compute for LLM Agents

King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou

PDF

Open Access

TL;DR

This paper systematically explores how applying various test-time scaling strategies can enhance the effectiveness of large language model agents, focusing on methods like parallel sampling, revision, verification, and diversification.

Contribution

It introduces and evaluates multiple test-time scaling strategies for language agents, providing insights into their impact on performance and guiding future improvements.

Findings

01

Scaling test-time compute improves agent performance.

02

Reflecting at appropriate times is crucial for effectiveness.

03

List-wise verification and merging outperform other methods.

Abstract

Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms; (2) sequential revision strategies; (3) verifiers and merging methods; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management · Semantic Web and Ontologies