Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Liran Ringel; Ameen Ali; Yaniv Romano

arXiv:2604.02560·cs.CL·April 6, 2026

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Liran Ringel, Ameen Ali, Yaniv Romano

PDF

TL;DR

This paper introduces DEMASK, a dependency-guided method for parallel decoding in discrete diffusion language models, improving speed while maintaining or enhancing output quality.

Contribution

DEMASK is a lightweight dependency predictor that guides parallel token unmasking, reducing distributional mismatch in discrete diffusion language models.

Findings

01

DEMASK achieves 1.7-2.2× speedup on Dream-7B.

02

It matches or improves accuracy compared to baseline methods.

03

Theoretical bounds relate dependency estimation to sampling quality.

Abstract

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.