Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation

Akiyoshi Tomihari; Issei Sato

arXiv:2605.04542·cs.LG·May 7, 2026

Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation

Akiyoshi Tomihari, Issei Sato

PDF

TL;DR

This paper reveals how power sampling, RL, and self-distillation are interconnected through the power distribution, enabling more efficient training and inference in large language models.

Contribution

It introduces power self-distillation, linking sampling, RL, and distillation via the power distribution, and demonstrates its effectiveness in reasoning tasks.

Findings

01

Power distribution connects sampling, RL, and distillation.

02

Power self-distillation can match power sampling performance with lower inference cost.

03

Power sampling increases self-reward, with true reward improvements depending on reward alignment.

Abstract

Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LLM performance. However, the relationship among RL, distillation, and sampling remains unclear. In this study, we focus on the power distribution, the target distribution of power sampling, and show that the power distribution bridges sampling, self-reward KL-regularized RL, and self-distillation. From the sampling perspective, we show that inexpensive local approximations cannot reproduce sequence-level power without information about possible suffixes. From the RL perspective, the power distribution is the closed-form optimizer of KL-regularized RL when the model's sequence-level log-probabilities are used as the reward. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.