On the Power of (Approximate) Reward Models for Inference-Time Scaling

Youheng Zhu; Yiping Lu

arXiv:2602.01381·cs.CL·February 3, 2026

On the Power of (Approximate) Reward Models for Inference-Time Scaling

Youheng Zhu, Yiping Lu

PDF

Open Access

TL;DR

This paper analyzes how approximate reward models can effectively guide inference-time scaling in large language models, showing that bounded Bellman error enables exponential efficiency gains with polynomial complexity.

Contribution

It provides a theoretical framework linking Bellman error bounds of approximate reward models to the efficiency of SMC-based inference in large language models.

Findings

01

Bounded Bellman error by O(1/T) ensures polynomial complexity

02

Approximate reward models can achieve exponential efficiency gains

03

Theoretical justification for using approximate rewards in inference scaling

Abstract

Inference-time scaling has recently emerged as a powerful paradigm for improving the reasoning capability of large language models. Among various approaches, Sequential Monte Carlo (SMC) has become a particularly important framework, enabling iterative generation, evaluation, rejection, and resampling of intermediate reasoning trajectories. A central component in this process is the reward model, which evaluates partial solutions and guides the allocation of computation during inference. However, in practice, true reward models are never available. All deployed systems rely on approximate reward models, raising a fundamental question: Why and when do approximate reward models suffice for effective inference-time scaling? In this work, we provide a theoretical answer. We identify the Bellman error of the approximate reward model as the key quantity governing the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms