ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Xiuyu Li; Jinkai Zhang; Mingyang Yi; Yu Li; Longqiang Wang; Yue Wang; Ju Fan

arXiv:2601.21484·cs.LG·May 20, 2026

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Xiuyu Li, Jinkai Zhang, Mingyang Yi, Yu Li, Longqiang Wang, Yue Wang, Ju Fan

PDF

1 Repo

TL;DR

ETS introduces a training-free, energy-guided inference method for RL alignment in language models, improving generation quality efficiently without costly retraining.

Contribution

It proposes a novel test-time scaling algorithm that estimates energy terms via online Monte Carlo, enabling effective RL alignment without additional training.

Findings

01

ETS improves generation quality across multiple benchmarks.

02

The method reduces inference latency significantly.

03

Experiments validate the effectiveness and efficiency of ETS.

Abstract

Reinforcement Learning (RL) post-training alignment for language models is effective, but also costly and unstable in practice, owing to its complicated training process. To address this, we propose a training-free inference method to sample directly from the optimal RL policy. The transition probability applied to Masked Language Modeling (MLM) consists of a reference policy model and an energy term. Based on this, our algorithm, Energy-Guided Test-Time Scaling (ETS), estimates the key energy term via online Monte Carlo, with a provable convergence rate. Moreover, to ensure practical efficiency, ETS leverages modern acceleration frameworks alongside tailored importance sampling estimators, substantially reducing inference latency while provably preserving sampling quality. Experiments on MLM (including autoregressive models and diffusion language models) across reasoning, coding, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sheriyuo/ETS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.