Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

Hongbeen Kim; Juhyun Lee; Sanghyeon Lee; Kwanghoon Choi; Jaehyuk Huh

arXiv:2604.00510·cs.AI·April 2, 2026

Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

Hongbeen Kim, Juhyun Lee, Sanghyeon Lee, Kwanghoon Choi, Jaehyuk Huh

PDF

TL;DR

This paper introduces adaptive techniques for Monte Carlo Tree Search that significantly reduce latency and improve throughput in large language model reasoning tasks by pruning unproductive searches and reallocating resources.

Contribution

It presents negative early exit and adaptive boosting mechanisms that enhance MCTS efficiency and are integrated into vLLM for better performance.

Findings

01

Reduced p99 end-to-end latency in MCTS-based reasoning.

02

Improved throughput without sacrificing accuracy.

03

Effective resource reallocation among concurrent searches.

Abstract

Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.