Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS

Alex ZH Dou; Zhongwei Wan; Dongfei Cui; Xin Wang; Jing Xiong; Haokun Lin; Chaofan Tao; Shen Yan; Mi Zhang

arXiv:2507.05557·cs.CL·July 9, 2025

Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS

Alex ZH Dou, Zhongwei Wan, Dongfei Cui, Xin Wang, Jing Xiong, Haokun Lin, Chaofan Tao, Shen Yan, Mi Zhang

PDF

Open Access

TL;DR

This paper introduces R2-LLMs, a hierarchical retrieval-augmented reasoning framework that improves test-time scaling of large language models by enhancing in-context learning and step-wise reasoning without needing distillation.

Contribution

The paper presents a novel hierarchical retrieval-augmented reasoning framework, R2-LLMs, that enhances test-time inference in LLMs through dual-level retrieval and MCTS, without requiring CoT training data.

Findings

01

Up to 16% performance improvement on reasoning benchmarks.

02

Effective integration of hierarchical retrieval with MCTS.

03

Robust reasoning enhancement without model distillation.

Abstract

Test-time scaling has emerged as a promising paradigm in language modeling, leveraging additional computational resources at inference time to enhance model performance. In this work, we introduce R2-LLMs, a novel and versatile hierarchical retrieval-augmented reasoning framework designed to improve test-time scaling in large language models (LLMs) without requiring distillation from more advanced models to obtain chain-of-thought (CoT) training data. R2-LLMs enhances inference-time generalization by integrating dual-level retrieval-based in-context learning: (1) At the coarse level, our approach extracts abstract templates from complex reasoning problems and retrieves similar problem-answer pairs to facilitate high-level in-context learning; (2) At the fine level, during Monte Carlo Tree Search (MCTS), R2-LLMs efficiently retrieves analogous intermediate solution steps from reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques