Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs
Bumjun Kim, Dongjae Jeon, Moongyu Jeon, Albert No

TL;DR
This paper introduces Dependency-Aware Parallel Decoding (DAPD), a training-free method that leverages self-attention to efficiently decode multiple tokens in diffusion LLMs by modeling token dependencies, improving accuracy and parallelism.
Contribution
The paper presents DAPD, a novel decoding approach that uses self-attention to identify token dependencies, enabling effective parallel decoding without retraining or auxiliary models.
Findings
DAPD improves accuracy-steps trade-off over existing methods.
DAPD enables more globally distributed parallel updates.
Experimental results on LLaDA and Dream validate DAPD's effectiveness.
Abstract
Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks · Topic Modeling
