Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
Xinhai Sun

TL;DR
This paper introduces Reinforcement Inference, a strategy that uses model uncertainty to improve reasoning accuracy by selectively re-invoking the model, significantly boosting performance without retraining.
Contribution
It presents a novel entropy-aware inference method that leverages the model's own uncertainty to enhance reasoning accuracy through selective re-asking, without additional training.
Findings
Accuracy improved from 60.72% to 84.03% on MMLU-Pro questions.
Re-asking only 61.06% of the time achieves most of the performance gain.
Uncertainty-based control outperforms generic prompting approaches.
Abstract
Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model's true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce Reinforcement Inference, an entropy-aware inference-time control strategy that uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance without any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
