Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Z-Image Team, Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, Zhen Li, Zhong-Yu Li, David Liu, Dongyang Liu, Junhan Shi, Qilong Wu, Feng Yu, Chi Zhang, Shifeng Zhang, Shilin Zhou

TL;DR
Z-Image introduces a highly efficient 6B-parameter image generation model using a novel single-stream diffusion transformer, achieving competitive performance with significantly reduced computational resources and enabling broad accessibility.
Contribution
The paper presents Z-Image, a scalable, efficient image generation model with a streamlined training process and a new architecture that challenges the scale-at-all-costs paradigm in the field.
Findings
Achieves performance comparable to top-tier models.
Enables sub-second inference on consumer hardware.
Reduces training costs to approximately $630K.
Abstract
The landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including Qwen-Image, Hunyuan-Image-3.0 and FLUX.2, are characterized by massive parameter counts (20B to 80B), making them impractical for inference, and fine-tuning on consumer-grade hardware. To address this gap, we propose Z-Image, an efficient 6B-parameter foundation generative model built upon a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture that challenges the "scale-at-all-costs" paradigm. By systematically optimizing the entire model lifecycle -- from a curated data infrastructure to a streamlined training curriculum -- we complete the full training workflow in just 314K H800 GPU hours (approx. $630K). Our few-step distillation scheme with reward post-training further yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Tongyi-MAI/Z-Image-Turbomodel· 824k dl· ♡ 4375824k dl♡ 4375
- 🤗unsloth/Z-Image-Turbo-GGUFmodel· 39k dl· ♡ 12039k dl♡ 120
- 🤗Tongyi-MAI/Z-Imagemodel· 51k dl· ♡ 102351k dl♡ 1023
- 🤗unsloth/Z-Image-GGUFmodel· 14k dl· ♡ 14114k dl♡ 141
- 🤗hsuwill000/Z-Image-Turbo-ovmodel· ♡ 1♡ 1
- 🤗unsloth/Z-Image-Turbo-unsloth-bnb-4bitmodel· 439 dl· ♡ 5439 dl♡ 5
- 🤗drbaph/Z-Image-fp8model· 6.0k dl· ♡ 226.0k dl♡ 22
- 🤗not-pegasus/IMAGE_MODALmodel· 5 dl5 dl
- 🤗tsqn/Z-Image-Turbo_fp32-fp16-bf16_full_and_ema-onlymodel· 609 dl· ♡ 12609 dl♡ 12
- 🤗kp-forks/Z-Image-Turbomodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Neural Network Applications
