Diffusion-State Policy Optimization for Masked Diffusion Language Models

Daisuke Oba; Hiroki Furuta; Naoaki Okazaki

arXiv:2602.06462·cs.CL·May 20, 2026

Diffusion-State Policy Optimization for Masked Diffusion Language Models

Daisuke Oba, Hiroki Furuta, Naoaki Okazaki

PDF

1 Repo

TL;DR

DiSPO introduces a novel credit-assignment layer for masked diffusion language models, enabling more effective intermediate decision optimization and improving performance on math and planning benchmarks.

Contribution

It proposes a plug-in credit-assignment layer that directly optimizes intermediate filling decisions in masked diffusion models, enhancing their performance.

Findings

01

DiSPO improves baseline performance on math and planning tasks.

02

It requires no additional multi-step diffusion rollouts or optimizer steps.

03

Supports use as a general plug-in for masked diffusion policy optimization.

Abstract

Masked diffusion language models generate text through iterative masked-token filling, but terminal-only rewards on final completions provide coarse credit assignment for the intermediate filling decisions that shape the generation process. We propose Diffusion-State Policy Optimization (DiSPO), a plug-in credit-assignment layer that directly optimizes intermediate filling decisions. At selected intermediate masked states, DiSPO branches by resampling the currently masked positions from rollout-cached logits, scores the resulting completions, and updates only the newly filled tokens, requiring no additional multi-step diffusion rollouts or optimizer steps. We formalize a fixed-state objective for branched completions and derive a policy-gradient estimator that reuses the same rollouts as terminal-feedback policy optimization. Experiments on LLaDA-8B-Instruct show that DiSPO consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://daioba.github.io/dispo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.