Qwen-Image Technical Report
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, Deqing Li, Hang Zhang, Hao Meng, Hu Wei, Jingyuan Ni

TL;DR
Qwen-Image is a new foundation model that significantly improves complex text rendering and precise image editing by employing a curriculum learning strategy and multi-task training, excelling in multiple languages and benchmarks.
Contribution
The paper introduces Qwen-Image, a novel image generation and editing model with a comprehensive data pipeline, progressive training, and dual-encoding for enhanced performance and consistency.
Findings
Achieves state-of-the-art results in image generation and editing.
Excels in alphabetic and logographic languages like English and Chinese.
Demonstrates strong performance across multiple benchmarks.
Abstract
We present Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. To address the challenges of complex text rendering, we design a comprehensive data pipeline that includes large-scale data collection, filtering, annotation, synthesis, and balancing. Moreover, we adopt a progressive training strategy that starts with non-text-to-text rendering, evolves from simple to complex textual inputs, and gradually scales up to paragraph-level descriptions. This curriculum learning approach substantially enhances the model's native text rendering capabilities. As a result, Qwen-Image not only performs exceptionally well in alphabetic languages such as English, but also achieves remarkable progress on more challenging logographic languages like Chinese. To enhance image editing consistency, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Qwen/Qwen-Image-2512model· 96k dl· ♡ 75296k dl♡ 752
- 🤗Qwen/Qwen-Image-Edit-2511model· 167k dl· ♡ 899167k dl♡ 899
- 🤗unsloth/Qwen-Image-Edit-2511-GGUFmodel· 100k dl· ♡ 422100k dl♡ 422
- 🤗Qwen/Qwen-Image-Editmodel· 70k dl· ♡ 235570k dl♡ 2355
- 🤗Qwen/Qwen-Imagemodel· 233k dl· ♡ 2438233k dl♡ 2438
- 🤗Azily/Macro-Qwen-Image-Editmodel· 18 dl· ♡ 518 dl♡ 5
- 🤗unsloth/Qwen-Image-2512-GGUFmodel· 54k dl· ♡ 32354k dl♡ 323
- 🤗Qwen/Qwen-Image-Edit-2509model· 225k dl· ♡ 1089225k dl♡ 1089
- 🤗1038lab/Qwen-Image-Edit-2511-FP8model· 11k dl· ♡ 3811k dl♡ 38
- 🤗drbaph/Qwen-Image-Edit-2511-FP8model· 3.2k dl· ♡ 143.2k dl♡ 14
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
