LayerFlow: A Unified Model for Layer-aware Video Generation

Sihui Ji; Hao Luo; Xi Chen; Yuanpeng Tu; Yiyang Wang; Hengshuang Zhao

arXiv:2506.04228·cs.CV·June 5, 2025

LayerFlow: A Unified Model for Layer-aware Video Generation

Sihui Ji, Hao Luo, Xi Chen, Yuanpeng Tu, Yiyang Wang, Hengshuang Zhao

PDF

Open Access

TL;DR

LayerFlow is a unified model that generates layered videos from prompts, supporting decomposition and background generation, using a multi-stage training strategy to handle limited high-quality layered video data.

Contribution

The paper introduces LayerFlow, a novel unified framework for layer-aware video generation that supports multiple variants within one model, utilizing a multi-stage training process.

Findings

01

Supports decomposition of blended videos into layers.

02

Generates smooth videos with desired layers during inference.

03

Effectively trains with limited high-quality layered video data.

Abstract

We present LayerFlow, a unified solution for layer-aware video generation. Given per-layer prompts, LayerFlow generates videos for the transparent foreground, clean background, and blended scene. It also supports versatile variants like decomposing a blended video or generating the background for the given foreground and vice versa. Starting from a text-to-video diffusion transformer, we organize the videos for different layers as sub-clips, and leverage layer embeddings to distinguish each clip and the corresponding layer-wise prompts. In this way, we seamlessly support the aforementioned variants in one unified framework. For the lack of high-quality layer-wise training videos, we design a multi-stage training strategy to accommodate static images with high-quality layer annotations. Specifically, we first train the model with low-quality video data. Then, we tune a motion LoRA to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection