Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Yuan Sui; Yufei He; Tri Cao; Simeng Han; Yulin Chen; Bryan Hooi

arXiv:2502.19918·cs.AI·May 8, 2026

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi

PDF

TL;DR

Meta-Reasoner introduces a dynamic, strategy-adapting framework using contextual multi-armed bandits to improve inference efficiency and accuracy in large language models during multi-step reasoning tasks.

Contribution

It presents a novel meta-guidance framework that enables LLMs to adapt reasoning strategies in real-time, enhancing performance and efficiency.

Findings

01

Outperforms previous SOTA methods by 9-12% in accuracy.

02

Reduces inference time by 28-35% under the same compute budget.

03

Demonstrates generalizability across diverse reasoning tasks.

Abstract

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.