Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Qingyu Shi; Jinbin Bai; Zhuoran Zhao; Wenhao Chai; Kaidong Yu; Jianzong Wu; Shuangyong Song; Yunhai Tong; Xiangtai Li; Xuelong Li; Shuicheng Yan

arXiv:2505.23606·cs.LG·April 14, 2026

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng Yan

PDF

1 Repo 1 Models

TL;DR

Muddit is a unified discrete diffusion transformer that enables fast, parallel multimodal generation across text and images by integrating pretrained visual priors with a lightweight text decoder.

Contribution

It introduces Muddit, a second-generation unified discrete diffusion model that combines strong visual priors with a lightweight decoder for efficient multimodal generation.

Findings

01

Muddit achieves competitive or superior performance to larger autoregressive models.

02

It enables fast, parallel generation across text and image modalities.

03

The model demonstrates the effectiveness of purely discrete diffusion with strong priors.

Abstract

Unified generation models aim to handle diverse tasks across modalities -- such as text generation, image generation, and vision-language reasoning -- within a single architecture and decoding paradigm. Autoregressive unified models suffer from slow inference due to sequential decoding, and non-autoregressive unified models suffer from weak generalization due to limited pretrained backbones. We introduce the second-generation Meissonic: Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder, enabling flexible and high-quality multimodal generation under a unified architecture. Empirical results show that Muddit achieves competitive or superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

m-e-agi-lab/Muddit
github

Models

🤗
MeissonFlow/Muddit
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.