Entropy Aware Reward Guidance for Diffusion Language Model Alignment

Atula Tejaswi; Litu Rout; Constantine Caramanis; Sanjay Shakkottai; Sujay Sanghavi

arXiv:2602.05000·cs.LG·May 14, 2026

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

Atula Tejaswi, Litu Rout, Constantine Caramanis, Sanjay Shakkottai, Sujay Sanghavi

PDF

2 Repos

TL;DR

This paper introduces EntRGi, a novel entropy-aware reward guidance method for discrete diffusion language models that improves test-time adaptation and reinforcement learning performance.

Contribution

EntRGi dynamically balances continuous relaxations and hard token sampling using entropy, maintaining reward model reliability and optimization accuracy.

Findings

01

EntRGi outperforms existing methods on 7B-parameter models.

02

It improves test-time adaptation and reward-guided reinforcement learning.

03

Empirical results show consistent performance gains.

Abstract

Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for discrete diffusion language models; now, one cannot differentiate through the natural outputs of the model because they are discrete tokens. We introduce a novel mechanism called EntRGi (Entropy aware Reward Guidance) to address this issue. EntRGi dynamically interpolates between continuous token relaxations and sampled hard tokens, on a token-by-token basis, using the diffusion model's predictive entropy. We demonstrate that EntRGi maintains both reward model reliability and optimization accuracy, while existing approaches sacrifice one for the other. We empirically validate our approach on 7B-parameter diffusion language models across two settings: (1) test-time adaptation, and (2) RGRL (Reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.