Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling
Seyyed Saeid Cheshmi, Azal Ahmad Khan, Xinran Wang, Zirui Liu, Ali Anwar

TL;DR
This paper proposes a method to improve the computational efficiency of large language models in reasoning tasks by using partial reward models for early rejection of suboptimal solutions during generation.
Contribution
It introduces the concept of Partial Reward Models for early rejection, reducing inference costs without sacrificing reasoning quality.
Findings
Achieves up to 9x reduction in inference FLOPs on math reasoning benchmarks.
Strong correlation between partial and final rewards supports early rejection effectiveness.
Theoretically proves exponential decrease in risk of discarding optimal solutions.
Abstract
Large Language Models (LLMs) are increasingly relied upon for solving complex reasoning tasks in domains such as mathematics, logic, and multi-step question answering. A growing line of work seeks to improve reasoning quality by scaling inference time compute particularly through Process Reward Models (PRMs), used to reward the reasoning at intermediate steps. While effective, these methods introduce substantial computational overhead, especially when generating large numbers of solutions in parallel. In this paper, we investigate whether PRMs can be used mid-generation to provide early signals that enable the rejection of suboptimal candidates before full generation of step is complete. We introduce the hypothesis that PRMs are also Partial Reward Models, meaning that the scores they assign to partially completed reasoning step are predictive of final output quality. This allows for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
