DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

Dake Bu; Wei Huang; Andi Han; Hau-San Wong; Qingfu Zhang; Taiji Suzuki; Atsushi Nitanda

arXiv:2604.24357·cs.LG·April 28, 2026

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

Dake Bu, Wei Huang, Andi Han, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda

PDF

1 Repo

TL;DR

DPRM introduces a flexible token-ordering module for diffusion language models, enhancing their ability to control token reveal and revision processes, leading to improved performance in various tasks.

Contribution

It proposes a novel plug-in token-ordering method based on Doob h transform, which maintains the original model architecture while optimizing token reveal policies.

Findings

01

DPRM outperforms confidence-based baselines in pretraining and test-time scaling.

02

It achieves significant gains on harder reasoning subsets.

03

In bioinformatics applications, DPRM improves structural and fragment-constrained metrics.

Abstract

Diffusion language models generate without a fixed left-to-right order, making token ordering a central algorithmic choice: which tokens should be revealed, retained, revised or verified at each step? Existing systems mainly use random masking or confidence-driven ordering. Random masking creates train--test mismatch, while confidence-only rules are efficient but can be myopic and suppress useful exploration. We introduce DPRM (Doob h-transform Process Reward Model), a plug-in token-ordering module for diffusion language models. DPRM keeps the host architecture, denoising objective and supervision unchanged, and changes only the ordering policy. It starts from confidence-driven progressive ordering and gradually shifts to Doob h transform Process Reward guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove O(1/N)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DakeBU/DPRM-DLLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.