Rational Metareasoning for Large Language Models
C. Nicol\`o De Sabbata, Theodore R. Sumers, Badr AlKhamissi, Antoine Bosselut, Thomas L. Griffiths

TL;DR
This paper proposes a metareasoning-based training method for large language models that selectively employs reasoning steps, reducing inference costs by 20-37% while maintaining performance.
Contribution
It introduces a novel reward function and training approach inspired by cognitive science to optimize reasoning efficiency in LLMs.
Findings
Reduces inference costs by 20-37% across models
Maintains task performance comparable to existing methods
Uses a reward function based on the Value of Computation
Abstract
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. How, then, might we optimize reasoning's cost-performance tradeoff? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37\% fewer tokens generated across…
Peer Reviews
Decision·Submitted to ICLR 2025
- Computational efficiency of LLM deployments is a timely topic relevant to the ICLR community. - The proposed method is sufficiently simple that something like it might well get used in practice. - The paper is well-written and clearly structured.
### a) Insufficient baselines and ablations -- I don't feel like I get the "shape" of the proposed method and the potential alternatives all that well. A few notes here: 1. How necessary is rationalization (described in the paragraph on line 131)? I assume the authors only use it because it's also used in STaR, the main baseline? Relatedly (and importantly), STaR should be described in more detail in the paper, and justified as a relevant baseline -- I needed to open up the original STaR paper
* The paper introduces a unique rational metareasoning approach that balances inference cost with performance, addressing a crucial need in the efficient deployment of LLMs. * The integration of the Value of Computation (VOC)-based reward function is well-designed and thoughtfully applied, showing careful consideration of LLM efficiency. * The approach is tested across a diverse set of benchmarks, covering science knowledge, commonsense reasoning, math problem-solving, and logical deduction, as
* Limited Analysis of Time Complexity: While the paper focuses on token reduction, it lacks an analysis of the time complexity of the proposed method. A deeper investigation into time savings would provide a clearer picture of its practical efficiency. * Narrow Range of LLMs and Tasks: The method was primarily tested on a limited selection of benchmarks and model architectures. Broader experimentation across different LLMs and a wider range of tasks would strengthen claims about the generalizabi
1. The authors introduce an interesting problem in LLM reasoning, optimizing LLMs’ inference cost and performance at the same time. This is important issue in using LLMs especially considering the LLMs inference cost is becoming larger. 2. The paper is well-written and easy to follow. Overall, I could follow the whole story that the authors want to present in this paper.
1. Lack experiments in more realistic datasets. LLMs are not limited to tasks in text space; they are frequently utilized as agents that interact with external tools to perform complex tasks in various environments. Incorporating experiments on more realistic datasets, such as GAIA [1] and ToolBench [2], would provide valuable insights into the model's performance in more complex reasoning scenarios. 2. Currently, the method and experiments focus exclusively on CoT reasoning, generating traject
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques · Topic Modeling
