dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Shirui Chen; Jiantao Jiao; Lillian J. Ratliff; Banghua Zhu

arXiv:2512.21446·cs.LG·February 9, 2026

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Shirui Chen, Jiantao Jiao, Lillian J. Ratliff, Banghua Zhu

PDF

Open Access 1 Models

TL;DR

dUltra introduces an on-policy reinforcement learning framework to optimize unmasking strategies in diffusion language models, significantly improving parallel token generation efficiency and accuracy over existing methods.

Contribution

It presents a novel RL-based approach with a learned unmasking planner that enhances parallel decoding in diffusion language models, surpassing prior heuristic and distillation methods.

Findings

01

dUltra achieves better accuracy-efficiency trade-offs on reasoning and code tasks.

02

The learned unmasking trajectories outperform heuristic baselines.

03

Code and checkpoints are publicly available.

Abstract

Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticated sampling strategies, limiting their parallel generation potential. Existing acceleration methods either rely on fixed confidence-based heuristics or use distillation-based approaches that finetune MDLMs on trajectories generated by a base model, which can become off-policy during finetuning and restrict performance to the quality of the base model's samples. We propose \texttt{dUltra}, an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. We jointly optimize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
sengi/dUltra-math-b128
model· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis