Relaxing Positional Alignment in Masked Diffusion Language Models
Mengyu Ye, Ryosuke Takahashi, Keito Kudo, Jun Suzuki

TL;DR
This paper proposes a flexible positional supervision method for masked diffusion language models, improving their robustness and performance in open-ended text generation by reducing sensitivity to token misalignments.
Contribution
It introduces a <slack> token and a TCTC-based training strategy to relax positional constraints in MDLMs, enhancing generation quality and robustness.
Findings
Outperforms original MDLM on five benchmarks
Increases robustness to token misalignments
Improves open-ended text generation quality
Abstract
Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation. We hypothesize that one cause of this gap is that strict positional prediction makes MDLM decoding highly sensitive to token misalignment, and we show through controlled interventions that a one-position shift can severely disrupt semantics. This observation suggests that enforcing strict positional supervision during training is misaligned with the irreversible denoising dynamics of MDLM decoding. Motivated by this mismatch, we adopt an alignment-flexible supervision strategy during fine-tuning. Specifically, we introduce a special token <slack> via the connectionist temporal classification objective. We apply this approach to the widely used MDLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Generative Adversarial Networks and Image Synthesis
