Text2Layer: Layered Image Generation using Latent Diffusion Model
Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien

TL;DR
Text2Layer introduces a diffusion-based approach to generate layered images with background, foreground, and masks simultaneously, enhancing image compositing workflows and mask quality.
Contribution
It presents a novel layered image generation method using latent diffusion models trained on an autoencoder, enabling integrated layered image synthesis.
Findings
High-quality layered image generation demonstrated
Produces superior layer masks compared to segmentation methods
Establishes a new benchmark for layered image synthesis
Abstract
Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion
