PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan

TL;DR
This paper introduces PrismLayers, a large open dataset and a new multi-layer transparent image generation model that enables detailed, editable images from text prompts, advancing creative control in image synthesis.
Contribution
It provides the first high-quality multi-layer transparent image dataset and a novel training-free synthesis pipeline, along with an open-source multi-layer generation model, ART+.
Findings
PrismLayersPro dataset contains 200K multi-layer transparent images.
ART+ outperforms the original ART model in 60% of user comparisons.
The synthesis pipeline generates high-quality layered images using diffusion models.
Abstract
Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key…
Peer Reviews
Decision·Submitted to ICLR 2026
+ Releases the first high-fidelity datasets for multi-layer transparent image generation, filling a major gap in the field. + The work employs rigorous artifact filtering, aesthetic scoring, and human selection to ensure dataset quality and diversity. The creating pipline and experiences are also valuable. + Fine-tuned models (ART+) trained on PrismLayersPro achieve better performance in both quantitative metrics and user studies compared with recent single-layer models.
- While high-quality, the datasets are primarily synthetic, and may not fully capture the complexity or coherence of real-world multi-layer images. - The accuracy of text rendering seems one of the eval dimension for this work. But it seems that in both user studies and metrics. How about the text rendering quality/accuracy of the proposed work?
- The authors propose a series of datasets that include a significant number of layers and in different scales with changing quality. Unlike the existing datasets, the proposed dataset shows a data scenario that can be generalized better to layered synthesis in imaging scenarios. - In addition to layers, the proposed dataset also gives a good source of text rendering data. - The paper proposes a multi-stage pipeline to generate a high quality data generation. The proposed pipeline covers the qua
- The related work section is limited in the paper and do not cover the majority of layered synthesis work. While it is understandable to not include in the main paper, such works should be acknowledged at least in the supplementary. - From the data samples included in the dataset, even in the Real data split of the proposed data, the samples seem artificial and do not serve as real samples. - As a primary use case of layered images, objects involving transparency properties have crucial importa
- High-quality and comprehensive open-source datasets. The paper constructs four Multi-Layer Transparent Image datasets, with the number of layers ranging up to 50. This makes substantial contributions to the field's advancement. - Pipeline for constructing high-quality multilayer transparent datasets. The paper proposes a novel pipeline that leverages FLUX—a powerful full-image generation model—to produce multilayer transparent images. - A new preference model for transparent image synthesis. A
- Lack of naturalness in fully synthesized datasets. Visualizations indicate the proposed dataset contains numerous cartoon elements that are generally easy to matte out from images. However, the paper does not propose a new approach to acquire transparent images from real-world scenes. - Failure to address key challenges in multi-layered image generation. For multi-layered image generation, shadows, lighting, transparent objects (e.g., glass), and reflections on water or other surfaces are cri
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Historical Architecture and Urbanism
