dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
Shirui Chen, Jiantao Jiao, Lillian J. Ratliff, Banghua Zhu

TL;DR
dUltra introduces an on-policy reinforcement learning framework to optimize unmasking strategies in diffusion language models, significantly improving parallel token generation efficiency and accuracy over existing methods.
Contribution
It presents a novel RL-based approach with a learned unmasking planner that enhances parallel decoding in diffusion language models, surpassing prior heuristic and distillation methods.
Findings
dUltra achieves better accuracy-efficiency trade-offs on reasoning and code tasks.
The learned unmasking trajectories outperform heuristic baselines.
Code and checkpoints are publicly available.
Abstract
Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticated sampling strategies, limiting their parallel generation potential. Existing acceleration methods either rely on fixed confidence-based heuristics or use distillation-based approaches that finetune MDLMs on trajectories generated by a base model, which can become off-policy during finetuning and restrict performance to the quality of the base model's samples. We propose \texttt{dUltra}, an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. We jointly optimize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
