Qwen-Image Technical Report

Chenfei Wu; Jiahao Li; Jingren Zhou; Junyang Lin; Kaiyuan Gao; Kun Yan; Sheng-ming Yin; Shuai Bai; Xiao Xu; Yilei Chen; Yuxiang Chen; Zecheng Tang; Zekai Zhang; Zhengyi Wang; An Yang; Bowen Yu; Chen Cheng; Dayiheng Liu; Deqing Li; Hang Zhang; Hao Meng; Hu Wei; Jingyuan Ni; Kai Chen; Kuan Cao; Liang Peng; Lin Qu; Minggang Wu; Peng Wang; Shuting Yu; Tingkun Wen; Wensen Feng; Xiaoxiao Xu; Yi Wang; Yichang Zhang; Yongqiang Zhu; Yujia Wu; Yuxuan Cai; Zenan Liu

arXiv:2508.02324·cs.CV·August 5, 2025

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, Deqing Li, Hang Zhang, Hao Meng, Hu Wei, Jingyuan Ni

PDF

10 Models

TL;DR

Qwen-Image is a new foundation model that significantly improves complex text rendering and precise image editing by employing a curriculum learning strategy and multi-task training, excelling in multiple languages and benchmarks.

Contribution

The paper introduces Qwen-Image, a novel image generation and editing model with a comprehensive data pipeline, progressive training, and dual-encoding for enhanced performance and consistency.

Findings

01

Achieves state-of-the-art results in image generation and editing.

02

Excels in alphabetic and logographic languages like English and Chinese.

03

Demonstrates strong performance across multiple benchmarks.

Abstract

We present Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. To address the challenges of complex text rendering, we design a comprehensive data pipeline that includes large-scale data collection, filtering, annotation, synthesis, and balancing. Moreover, we adopt a progressive training strategy that starts with non-text-to-text rendering, evolves from simple to complex textual inputs, and gradually scales up to paragraph-level descriptions. This curriculum learning approach substantially enhances the model's native text rendering capabilities. As a result, Qwen-Image not only performs exceptionally well in alphabetic languages such as English, but also achieves remarkable progress on more challenging logographic languages like Chinese. To enhance image editing consistency, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.