Breaking the Factorization Barrier in Diffusion Language Models
Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji Liu

TL;DR
This paper introduces CoDD, a hybrid diffusion framework that overcomes the factorization barrier in language models by modeling complex joint dependencies efficiently, improving coherence and speed.
Contribution
The paper proposes Coupled Discrete Diffusion (CoDD), a novel method that replaces fully-factorized outputs with a lightweight, expressive probabilistic layer, enabling joint dependency modeling without parameter explosion.
Findings
CoDD enhances diverse diffusion models with negligible overhead.
It matches the reasoning performance of RL baselines at lower training costs.
Prevents performance collapse in few-step generation, reducing latency.
Abstract
Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the "factorization barrier": the assumption that simultaneously predicted tokens are independent. This limitation forces a trade-off: models must either sacrifice speed by resolving dependencies sequentially or suffer from incoherence due to factorization. We argue that this barrier arises not from limited backbone expressivity, but from a structural misspecification: models are restricted to fully factorized outputs because explicitly parameterizing a joint distribution would require the Transformer to output a prohibitively large number of parameters. We propose Coupled Discrete Diffusion (CoDD), a hybrid framework that breaks this barrier by replacing the fully-factorized output distribution with a lightweight, tractable probabilistic inference layer. This formulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
