LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Tiwei Bie; Maosong Cao; Xiang Cao; Bingsen Chen; Fuyuan Chen; Kun Chen; Lun Du; Daozhuo Feng; Haibo Feng; Mingliang Gong; Zhuocheng Gong; Yanmei Gu; Jian Guan; Kaiyuan Guan; Hongliang He; Zenan Huang; Juyong Jiang; Zhonghui Jiang; Zhenzhong Lan; Chengxi Li; Jianguo Li; Zehuan Li; Huabin Liu; Lin Liu; Guoshan Lu; Yuan Lu; Yuxin Ma; Xingyu Mou; Zhenxuan Pan; Kaida Qiu; Yuji Ren; Jianfeng Tan; Yiding Tian; Zian Wang; Lanning Wei; Tao Wu; Yipeng Xing; Wentao Ye; Liangyu Zha; Tianze Zhang; Xiaolu Zhang; Junbo Zhao; Da Zheng; Hao Zhong; Wanli Zhong; Jun Zhou; Junlin Zhou; Liwang Zhu; Muzhi Zhu; Yihong Zhuang

arXiv:2602.08676·cs.LG·February 16, 2026

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Tiwei Bie, Maosong Cao, Xiang Cao, Bingsen Chen, Fuyuan Chen, Kun Chen, Lun Du, Daozhuo Feng, Haibo Feng, Mingliang Gong, Zhuocheng Gong, Yanmei Gu, Jian Guan, Kaiyuan Guan, Hongliang He, Zenan Huang, Juyong Jiang, Zhonghui Jiang, Zhenzhong Lan, Chengxi Li, Jianguo Li, Zehuan Li

PDF

Open Access 5 Models

TL;DR

LLaDA2.1 introduces a novel token editing approach combined with a configurable decoding scheme and reinforcement learning to significantly improve speed and quality in large diffusion-based language models.

Contribution

The paper presents LLaDA2.1, integrating Token-to-Token editing with Mask-to-Token decoding and a large-scale RL framework, achieving faster decoding and enhanced reasoning and instruction-following.

Findings

01

Achieves up to 892 TPS on HumanEval+

02

Outperforms prior models on 33 benchmarks

03

Offers flexible modes balancing speed and quality

Abstract

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Natural Language Processing Techniques