Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Xiaotong Ji; Rasul Tutunov; Matthieu Zimmer; Haitham Bou Ammar

arXiv:2601.21590·cs.LG·January 30, 2026

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer, Haitham Bou Ammar

PDF

Open Access 1 Models

TL;DR

This paper introduces a training-free, scalable method for improving large language models' reasoning by approximating power distributions with a token-level scaling approach, eliminating the need for costly MCMC sampling.

Contribution

The authors propose a novel, theoretically grounded algorithm that sharpens LLMs' distributions without training or external rewards, significantly reducing inference latency.

Findings

01

Matches or surpasses one-shot GRPO performance on various tasks

02

Reduces inference latency by over 10x compared to MCMC-based sampling

03

Effective across math, QA, and code tasks with multiple LLMs

Abstract

Reinforcement learning (RL) post-training is a dominant approach for improving the reasoning performance of large language models (LLMs), yet growing evidence suggests that its gains arise primarily from distribution sharpening rather than the acquisition of new capabilities. Recent work has shown that sampling from the power distribution of LLMs using Markov chain Monte Carlo (MCMC) can recover performance comparable to RL post-training without relying on external rewards; however, the high computational cost of MCMC makes such approaches impractical for widespread adoption. In this work, we propose a theoretically grounded alternative that eliminates the need for iterative MCMC. We derive a novel formulation showing that the global power distribution can be approximated by a token-level scaled low-temperature one, where the scaling factor captures future trajectory quality. Leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Jarrodbarnes/KernelBench-RLVR-120b
model· 19 dl· ♡ 2
19 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications