One Token Is Enough: Improving Diffusion Language Models with a Sink Token
Zihou Zhang, Zheyong Xie, Li Zhong, Haifeng Liu, Yao Hu, Shaosheng Cao

TL;DR
This paper introduces a simple method to stabilize diffusion language models by adding a single dedicated sink token, which improves performance and robustness by controlling information flow during text generation.
Contribution
The paper proposes a novel approach of adding an extra sink token with a modified attention mask to address sink token instability in diffusion language models.
Findings
Introducing a sink token stabilizes attention sinks.
The sink token improves model performance significantly.
The sink token's effectiveness is position-independent and semantically negligible.
Abstract
Diffusion Language Models (DLMs) have emerged as a compelling alternative to autoregressive approaches, enabling parallel text generation with competitive performance. Despite these advantages, there is a critical instability in DLMs: the moving sink phenomenon. Our analysis indicates that sink tokens exhibit low-norm representations in the Transformer's value space, and that the moving sink phenomenon serves as a protective mechanism in DLMs to prevent excessive information mixing. However, their unpredictable positions across diffusion steps undermine inference robustness. To resolve this, we propose a simple but effective extra sink token implemented via a modified attention mask. Specifically, we introduce a special token constrained to attend solely to itself, while remaining globally visible to all other tokens. Experimental results demonstrate that introducing a single extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Language and cultural evolution
