Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Isha Puri; Shivchander Sudalairaj; Guangxuan Xu; Kai Xu; Akash Srivastava

arXiv:2502.01618·cs.LG·August 15, 2025

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava

PDF

Open Access 1 Repo

TL;DR

This paper introduces a probabilistic inference approach using particle-based Monte Carlo methods to improve inference-time scaling of large language models, achieving significant efficiency gains over traditional search methods.

Contribution

It presents a novel probabilistic inference framework for LLM scaling at inference time, connecting probabilistic inference techniques with LLM optimization.

Findings

01

Achieves 4-16x better scaling than deterministic search methods.

02

Qwen2.5-Math-1.5B-Instruct surpasses GPT-4o accuracy in 4 rollouts.

03

Qwen2.5-Math-7B-Instruct reaches o1 accuracy in 32 rollouts.

Abstract

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Red-Hat-AI-Innovation-Team/its_hub
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Destructive Testing Techniques

MethodsSparse Evolutionary Training