DMax: Aggressive Parallel Decoding for dLLMs

Zigeng Chen; Gongfan Fang; Xinyin Ma; Ruonan Yu; Xinchao Wang

arXiv:2604.08302·cs.LG·May 18, 2026

DMax: Aggressive Parallel Decoding for dLLMs

Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang

PDF

1 Repo 3 Models 2 Datasets

TL;DR

DMax introduces a novel parallel decoding paradigm for diffusion language models that enhances efficiency and quality through self-refinement and a unified training strategy, significantly improving performance on benchmarks.

Contribution

It proposes a new decoding approach and training method for dLLMs, enabling aggressive parallel decoding while maintaining high generation quality.

Findings

01

Improves TPF on GSM8K from 2.04 to 5.47.

02

Increases TPF on MBPP from 2.71 to 5.86.

03

Achieves 1,338 TPS on H200 GPUs at batch size 1.

Abstract

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

czg1225/DMax
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.