Learning Unmasking Policies for Diffusion Language Models
Metod Jazbec, Theo X. Olausson, Louis B\'ethune, Pierre Ablin, Michael Kirchhof, Jo\~ao Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi

TL;DR
This paper introduces reinforcement learning-trained unmasking policies for diffusion language models, improving sampling efficiency and performance over heuristic methods, especially in full-diffusion scenarios.
Contribution
It formalizes diffusion sampling as a Markov decision process and proposes a lightweight transformer policy trained via reinforcement learning, outperforming heuristics in certain settings.
Findings
Trained policies match heuristic performance in semi-autoregressive generation.
Outperform heuristics in full-diffusion sampling.
Reinforcement learning improves diffusion sampling efficiency.
Abstract
Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger block sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy based on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
