Attention-Based Sampler for Diffusion Language Models
Yuyan Zhou, Kai Syun Hou, Weiyu Chen, James Kwok

TL;DR
This paper introduces Attn-Sampler, a decoding algorithm for diffusion language models that leverages attention scores to optimize sequence likelihood, improving efficiency and quality.
Contribution
It provides a theoretically grounded attention-guided decoding method, Attn-Sampler, that enhances parallelism and performance without additional training.
Findings
Attn-Sampler achieves superior generation quality compared to baseline methods.
The method enhances decoding parallelism, reducing inference time.
Theoretical analysis justifies attention-based decoding as near-optimal for sequence likelihood.
Abstract
Auto-regressive models (ARMs) have established a dominant paradigm in language modeling. However, their strictly sequential decoding paradigm imposes fundamental constraints on both inference efficiency and modeling flexibility. To address these limitations, diffusion-based large language models (dLLMs) have been proposed, offering the potential for parallel decoding and flexible language modeling. Despite these advantages, current dLLMs decoding strategies rely primarily on token level information, which fails to account for global sequence structure and often yields suboptimal results. In this paper, we study the decoding order selection problem from the perspective of log-likelihood maximization. We theoretically demonstrate that optimal sequence likelihood can be approximately achieved by decoding tokens in descending order of their attention matrix column sums. This finding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
