Blended Latent Diffusion under Attention Control for Real-World Video   Editing

Deyin Liu; Lin Yuanbo Wu; Xianghua Xie

arXiv:2409.03514·cs.CV·September 6, 2024

Blended Latent Diffusion under Attention Control for Real-World Video Editing

Deyin Liu, Lin Yuanbo Wu, Xianghua Xie

PDF

Open Access

TL;DR

This paper introduces a novel method for local real-world video editing by adapting a latent diffusion model with autonomous masking and enhanced temporal consistency, addressing limitations of existing approaches.

Contribution

It presents a blended latent diffusion framework with autonomous masking and temporal-spatial attention blocks for improved local video editing.

Findings

01

Effective in preserving background details.

02

Autonomous mask generation improves editing efficiency.

03

Enhanced temporal consistency across frames.

Abstract

Due to lack of fully publicly available text-to-video models, current video editing methods tend to build on pre-trained text-to-image generation models, however, they still face grand challenges in dealing with the local editing of video with temporal information. First, although existing methods attempt to focus on local area editing by a pre-defined mask, the preservation of the outside-area background is non-ideal due to the spatially entire generation of each frame. In addition, specially providing a mask by user is an additional costly undertaking, so an autonomous masking strategy integrated into the editing process is desirable. Last but not least, image-level pretrained model hasn't learned temporal information across frames of a video which is vital for expressing the motion and dynamics. In this paper, we propose to adapt a image-level blended latent diffusion model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Diffusion · Latent Diffusion Model · Concatenated Skip Connection · U-Net · Focus