Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Yuxuan Song; Zheng Zhang; Cheng Luo; Pengyang Gao; Fan Xia; Hao Luo; Zheng Li; Yuehang Yang; Hongli Yu; Xingwei Qu; Yuwei Fu; Jing Su; Ge Zhang; Wenhao Huang; Mingxuan Wang; Lin Yan; Xiaoying Jia; Jingjing Liu; Wei-Ying Ma; Ya-Qin Zhang; Yonghui Wu; Hao Zhou

arXiv:2508.02193·cs.CL·August 5, 2025

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, Yuwei Fu, Jing Su, Ge Zhang, Wenhao Huang, Mingxuan Wang, Lin Yan, Xiaoying Jia, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Yonghui Wu, Hao Zhou

PDF

Open Access 1 Models

TL;DR

Seed Diffusion is a large-scale diffusion-based language model that achieves high-speed inference through parallel generation, significantly outperforming existing models in speed while maintaining competitive quality on code benchmarks.

Contribution

This paper introduces Seed Diffusion, a novel discrete-state diffusion model that enables fast, parallel inference for language modeling, setting new speed records without sacrificing performance.

Findings

01

Achieves 2,146 tokens/sec inference speed on H20 GPUs.

02

Maintains competitive performance across standard code benchmarks.

03

Outperforms Mercury and Gemini Diffusion in speed while preserving quality.

Abstract

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JorgeVanco/diffusionGPT
model· 326 dl· ♡ 1
326 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Parallel Computing and Optimization Techniques · Speech Recognition and Synthesis