Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu

TL;DR
Qwen-Image-Layered introduces an end-to-end diffusion model that decomposes images into editable, semantically disentangled RGBA layers, enabling consistent and isolated image editing akin to professional design tools.
Contribution
It proposes a novel multilayer image decomposition framework with variable layer handling, supported by a new pipeline for extracting multilayer images from Photoshop files.
Findings
Outperforms existing methods in decomposition quality
Enables inherently editable image representations
Establishes a new paradigm for consistent image editing
Abstract
Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose \textbf{Qwen-Image-Layered}, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling \textbf{inherent editability}, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Qwen/Qwen-Image-Layeredmodel· 22k dl· ♡ 104022k dl♡ 1040
- 🤗unsloth/Qwen-Image-Layered-GGUFmodel· 2.7k dl· ♡ 472.7k dl♡ 47
- 🤗Runware/Qwen-Image-Layeredmodel· 206 dl206 dl
- 🤗vantagewithai/Qwen-Image-Layered-GGUFmodel· 285 dl· ♡ 2285 dl♡ 2
- 🤗mzbac/Qwen-Image-Layered-8bitmodel· 54 dl· ♡ 154 dl♡ 1
- 🤗zimengxiong/Qwen-Image-Layered-6bitmodel· ♡ 2♡ 2
- 🤗mzbac/Qwen-Image-Layered-6bitmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗sunqiang/Qwen-Image-Layeredmodel
- 🤗Frederic75/Qwen-Image-Layered-GGUFmodel· 134 dl134 dl
- 🤗terryrdt/Qwen-Image-Layeredmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media and Philosophy · Digital Humanities and Scholarship
