Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

Seyedarmin Azizi; Erfan Baghaei Potraghloo; Minoo Ahmadi; Souvik Kundu; Massoud Pedram

arXiv:2602.10273·stat.ML·March 24, 2026

Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

Seyedarmin Azizi, Erfan Baghaei Potraghloo, Minoo Ahmadi, Souvik Kundu, Massoud Pedram

PDF

Open Access

TL;DR

Power-SMC introduces a low-latency, training-free Sequential Monte Carlo method for sequence-level power sampling in large language models, significantly improving reasoning performance without the inference slowdowns of prior methods.

Contribution

It proposes Power-SMC, a novel SMC scheme that efficiently approximates power sampling for LLM reasoning, reducing latency while maintaining or improving performance.

Findings

01

Power-SMC matches or exceeds Metropolis-Hastings power sampling performance.

02

Reduces inference latency from 16-28x to 1.4-3.3x over baseline decoding.

03

Provides theoretical analysis of proposal distributions and stability improvements.

Abstract

Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $π_{α} (y ∣ x) \propto p_{θ} (y ∣ x)^{α}$ ( $α > 1$ ), which concentrates mass on whole sequences instead of adjusting token-level temperature. Prior work shows that Metropolis--Hastings (MH) sampling from this distribution recovers strong reasoning performance, but at order-of-magnitude inference slowdowns. We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency. Power-SMC advances a small particle set in parallel, corrects importance weights token-by-token, and resamples when necessary,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification