PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models
Yanyi Lyu, Letian Chen, Futing Sun, Miao Zhang, Weili Guan, Liqiang Nie

TL;DR
PulseCol introduces a novel column-sparse attention mechanism that reuses sparse patterns across denoising steps, significantly accelerating diffusion language models while preserving quality.
Contribution
It proposes a finer-grained, periodically refreshed column-sparse attention method that improves efficiency and speedup in diffusion language models compared to prior sparse attention techniques.
Findings
PulseCol achieves up to 1.95× speedup over FlashAttention.
It maintains model quality while increasing sparsity and efficiency.
Reuses sparse patterns across iterations for better acceleration.
Abstract
Inference in diffusion large language models (dLLMs) is computationally expensive, as full self-attention must be repeatedly executed at each step of the denoising process without KV cache. Recent sparse attention methods for dLLMs mitigate this cost via block-sparse computation, which is applied only in later iterations when model performance is less sensitive to coarse-grained sparse approximation, but yields limited improvements in computational efficiency and acceleration. This motivates a finer-grained sparsification strategy that can be applied from earlier iterations and leverages reusable sparsity patterns, enabling further efficiency gains. In this work, we introduce PulseCol, a periodically refreshed column-sparse attention method for accelerating diffusion language models. PulseCol replaces coarse block-level sparsity with a finer-grained column-sparse structure, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
