MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models

Chen-Hao Chao; Wei-Fang Sun; Junwei Quan; Chun-Yi Lee; Rahul G. Krishnan

arXiv:2603.16077·cs.LG·May 22, 2026

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models

Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan

PDF

1 Repo 2 Models

TL;DR

MDM-Prime-v2 introduces binary encoding and index shuffling to improve the scalability and performance of diffusion language models, addressing previous limitations in subtokenizer design.

Contribution

It presents a novel subtokenizer design with binary encoding and index shuffling, enabling better hyperparameter guidance and improved downstream performance.

Findings

01

MDM-Prime-v2 outperforms similar-sized baselines on eight benchmarks.

02

The model demonstrates superior zero-shot accuracy at 1.1B parameters.

03

Analysis guides principled subtokenizer design for diffusion models.

Abstract

Masked diffusion models (MDM) exhibit superior generalization when learned using a Partial masking scheme (Prime). This approach converts tokens into sub-tokens and models the diffusion process at the sub-token level. We identify two limitations of the MDM-Prime framework. First, we find that the functional form of the subtokenizer significantly increases the cross-entropy loss in the objective when paired with commonly used Byte-Pair-Encoding (BPE) tokenizers. Second, we lack tools to guide the hyperparameter choice of the token granularity in the subtokenizer. To address these limitations, we analyze the optimal design of the subtokenizer that minimizes MDM-Prime training objective and develop MDM-Prime-v2, a masked diffusion language model which incorporates Binary Encoding and Index Shuffling. Our analysis characterizes how token granularity and sub-token entropy influence the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen-hao-chao/mdm-prime-v2
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.