LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

Vlad-Constantin Lungu-Stan; Ionut Mironica; Mariana-Iuliana Georgescu

arXiv:2603.17965·cs.CV·March 19, 2026

LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

Vlad-Constantin Lungu-Stan, Ionut Mironica, Mariana-Iuliana Georgescu

PDF

Open Access

TL;DR

LaDe is a novel latent diffusion framework that enables flexible, multi-layered media design generation and decomposition from natural language prompts, improving over existing methods in layer flexibility and semantic coherence.

Contribution

LaDe introduces a unified, multi-task framework combining prompt expansion, a 4D positional encoding diffusion model, and RGBA decoding to generate and decompose layered media from text.

Findings

01

Outperforms Qwen-Image-Layered in text-to-layers generation

02

Improves text-to-layer alignment as validated by GPT-4o mini and Qwen3-VL evaluators

03

Supports flexible number of layers with semantic meaning

Abstract

Media design layer generation enables the creation of fully editable, layered design documents such as posters, flyers, and logos using only natural language prompts. Existing methods either restrict outputs to a fixed number of layers or require each layer to contain only spatially continuous regions, causing the layer count to scale linearly with design complexity. We propose LaDe (Layered Media Design), a latent diffusion framework that generates a flexible number of semantically meaningful layers. LaDe combines three components: an LLM-based prompt expander that transforms a short user intent into structured per-layer descriptions that guide the generation, a Latent Diffusion Transformer with a 4D RoPE positional encoding mechanism that jointly generates the full media design and its constituent RGBA layers, and an RGBA VAE that decodes each layer with full alpha-channel support. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInteractive and Immersive Displays · Generative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship