Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer, Haitham Bou Ammar

TL;DR
This paper introduces a training-free, scalable method for improving large language models' reasoning by approximating power distributions with a token-level scaling approach, eliminating the need for costly MCMC sampling.
Contribution
The authors propose a novel, theoretically grounded algorithm that sharpens LLMs' distributions without training or external rewards, significantly reducing inference latency.
Findings
Matches or surpasses one-shot GRPO performance on various tasks
Reduces inference latency by over 10x compared to MCMC-based sampling
Effective across math, QA, and code tasks with multiple LLMs
Abstract
Reinforcement learning (RL) post-training is a dominant approach for improving the reasoning performance of large language models (LLMs), yet growing evidence suggests that its gains arise primarily from distribution sharpening rather than the acquisition of new capabilities. Recent work has shown that sampling from the power distribution of LLMs using Markov chain Monte Carlo (MCMC) can recover performance comparable to RL post-training without relying on external rewards; however, the high computational cost of MCMC makes such approaches impractical for widespread adoption. In this work, we propose a theoretically grounded alternative that eliminates the need for iterative MCMC. We derive a novel formulation showing that the global power distribution can be approximated by a token-level scaled low-temperature one, where the scaling factor captures future trajectory quality. Leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
