Blended Latent Diffusion under Attention Control for Real-World Video Editing
Deyin Liu, Lin Yuanbo Wu, Xianghua Xie

TL;DR
This paper introduces a novel method for local real-world video editing by adapting a latent diffusion model with autonomous masking and enhanced temporal consistency, addressing limitations of existing approaches.
Contribution
It presents a blended latent diffusion framework with autonomous masking and temporal-spatial attention blocks for improved local video editing.
Findings
Effective in preserving background details.
Autonomous mask generation improves editing efficiency.
Enhanced temporal consistency across frames.
Abstract
Due to lack of fully publicly available text-to-video models, current video editing methods tend to build on pre-trained text-to-image generation models, however, they still face grand challenges in dealing with the local editing of video with temporal information. First, although existing methods attempt to focus on local area editing by a pre-defined mask, the preservation of the outside-area background is non-ideal due to the spatially entire generation of each frame. In addition, specially providing a mask by user is an additional costly undertaking, so an autonomous masking strategy integrated into the editing process is desirable. Last but not least, image-level pretrained model hasn't learned temporal information across frames of a video which is vital for expressing the motion and dynamics. In this paper, we propose to adapt a image-level blended latent diffusion model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Diffusion · Latent Diffusion Model · Concatenated Skip Connection · U-Net · Focus
