ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition
Muyang Zhao, Qi Qi, Hao Sun

TL;DR
This paper introduces ROI-Reasoning, a framework that enables large language models to strategically allocate computational resources during reasoning tasks by predicting task difficulty and optimizing decision-making under strict token budgets.
Contribution
It formalizes budgeted inference as an OS-MCKP problem and develops a two-stage approach combining meta-cognitive fine-tuning and reinforcement learning for budget-aware reasoning.
Findings
Improves reasoning scores under strict token budgets
Reduces regret in computational resource allocation
Enhances model's ability to predict task difficulty and utility
Abstract
Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
