Empirical Analysis of Decoding Biases in Masked Diffusion Models
Pengcheng Huang, Tianming Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Tong Xiao, Zulong Chen, Maosong Sun

TL;DR
This paper investigates the internal attention mechanisms of Masked Diffusion Models, revealing a unique dynamic attention pattern called Attention Floating, which explains their strong in-context learning abilities and performance advantages over autoregressive models.
Contribution
It uncovers the Attention Floating phenomenon in MDMs and explains its role in their superior in-context learning capabilities compared to ARMs.
Findings
Attention Floating is observed in MDMs, with shifting attention anchors.
Shallow layers build global structure; deep layers focus on semantic content.
MDMs outperform ARMs by doubling performance in knowledge-intensive tasks.
Abstract
Masked diffusion models (MDMs), which leverage bidirectional attention and a denoising process, are narrowing the performance gap with autoregressive models (ARMs). However, their internal attention mechanisms remain under-explored. This paper investigates the attention behaviors in MDMs, revealing the phenomenon of Attention Floating. Unlike ARMs, where attention converges to a fixed sink, MDMs exhibit dynamic, dispersed attention anchors that shift across denoising steps and layers. Further analysis reveals its Shallow Structure-Aware, Deep Content-Focused attention mechanism: shallow layers utilize floating tokens to build a global structural framework, while deeper layers allocate more capability toward capturing semantic content. Empirically, this distinctive attention pattern provides a mechanistic explanation for the strong in-context learning capabilities of MDMs, allowing them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
