OptScale: Probabilistic Optimality for Inference-time Scaling

Youkang Wang; Jian Wang; Rubing Chen; Xiao-Yong Wei

arXiv:2506.22376·cs.LG·December 22, 2025

OptScale: Probabilistic Optimality for Inference-time Scaling

Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei

PDF

Open Access

TL;DR

This paper introduces OptScale, a probabilistic framework and algorithm for principled inference-time scaling of LLMs, optimizing sampling efficiency while maintaining reasoning performance.

Contribution

It provides the first theoretical lower bound for sample requirements and develops OptScale, a practical method for dynamic, optimal sampling in LLM inference.

Findings

01

OptScale reduces sampling overhead significantly.

02

It maintains or improves reasoning performance compared to state-of-the-art methods.

03

Experimental results on reasoning benchmarks validate the effectiveness of OptScale.

Abstract

Inference-time scaling has emerged as a powerful technique for enhancing the reasoning performance of Large Language Models (LLMs). However, existing approaches often rely on heuristic strategies for parallel sampling, lacking a principled foundation. To address this gap, we propose a probabilistic framework that formalizes the optimality of inference-time scaling under the assumption that parallel samples are independently and identically distributed (i.i.d.), and where the Best-of- $N$ selection strategy follows a probability distribution that can be estimated. Within this framework, we derive a theoretical lower bound on the required number of samples to achieve a target performance level, providing the first principled guidance for compute-efficient scaling. Leveraging this insight, we develop \textsc{OptScale}, a practical algorithm that dynamically determines the optimal number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications