Dynamic Chunking for Diffusion Language Models
Yichen Zhu, Xiaoming Shi, Peng Zhao, Weiyu Chen, Debing Zhang, James Kwok

TL;DR
The paper introduces DCDM, a diffusion language model that uses content-defined semantic chunks for more effective sequence modeling, outperforming fixed-position block methods.
Contribution
It proposes a novel differentiable Chunking Attention mechanism that dynamically groups tokens into semantic chunks, improving diffusion language modeling.
Findings
DCDM outperforms fixed-position block diffusion models on multiple benchmarks.
Semantic chunks improve sequence likelihood factorization.
Advantages are stable across different model scales and early in training.
Abstract
Block discrete diffusion language models factorize a sequence autoregressively over fixed-size positional blocks, decoupling within-block parallel denoising from across-block conditioning. We argue that this rigid partition wastes structure already present in the sequence: blocks defined by position rather than by content separate semantically coherent tokens and group unrelated ones together. We introduce the \textbf{D}ynamic \textbf{C}hunking \textbf{D}iffusion \textbf{M}odel (DCDM), which replaces positional blocks with content-defined semantic chunks. At its core is Chunking Attention, a differentiable layer that routes tokens into clusters parameterized by learnable subspaces and shaped end-to-end by the diffusion objective. The resulting cluster assignments induce a chunk-causal attention mask under which a discrete diffusion denoiser factorizes the sequence likelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
