LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao

TL;DR
LLaDA2.0-Uni is a unified multimodal diffusion large language model that integrates understanding and generation of text and images, achieving high performance with efficient inference.
Contribution
It introduces a novel architecture combining discrete tokenization, MoE backbone, and diffusion decoding for unified multimodal processing.
Findings
Matches specialized VLMs in multimodal understanding
Delivers high-fidelity image generation and editing
Supports interleaved generation and reasoning
Abstract
We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model enables block-level masked diffusion for both text and vision inputs within the backbone, while the decoder reconstructs visual tokens into high-fidelity images. Inference efficiency is enhanced beyond parallel decoding through prefix-aware optimizations in the backbone and few-step distillation in the decoder. Supported by carefully curated large-scale data and a tailored multi-stage training pipeline, LLaDA2.0-Uni matches specialized VLMs in multimodal understanding while delivering strong performance in image generation and editing.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗inclusionAI/LLaDA2.0-Unimodel· 4.7k dl· ♡ 2474.7k dl♡ 247
- 🤗treadon/mlx-llada2-unimodel· ♡ 2♡ 2
- 🤗SanDiegoDude/LLaDA2.0-Uni-bnb-nf4model· 355 dl· ♡ 2355 dl♡ 2
- 🤗ZR0Z/LLaDA2.0-Unimodel· 24 dl· ♡ 124 dl♡ 1
- 🤗alexcomicart007/LLaDA2.0-Unimodel· 21 dl21 dl
- 🤗inclusionAI/LLaDA2.0-Uni-FP8model· 2.6k dl· ♡ 32.6k dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
