Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava

TL;DR
This paper introduces a probabilistic inference approach using particle-based Monte Carlo methods to improve inference-time scaling of large language models, achieving significant efficiency gains over traditional search methods.
Contribution
It presents a novel probabilistic inference framework for LLM scaling at inference time, connecting probabilistic inference techniques with LLM optimization.
Findings
Achieves 4-16x better scaling than deterministic search methods.
Qwen2.5-Math-1.5B-Instruct surpasses GPT-4o accuracy in 4 rollouts.
Qwen2.5-Math-7B-Instruct reaches o1 accuracy in 32 rollouts.
Abstract
Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNon-Destructive Testing Techniques
MethodsSparse Evolutionary Training
