COVE: Unleashing the Diffusion Feature Correspondence for Consistent   Video Editing

Jiangshan Wang; Yue Ma; Jiayi Guo; Yicheng Xiao; Gao Huang; Xiu Li

arXiv:2406.08850·cs.CV·December 10, 2024

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

Jiangshan Wang, Yue Ma, Jiayi Guo, Yicheng Xiao, Gao Huang, Xiu Li

PDF

Open Access 1 Repo

TL;DR

COVE introduces a novel diffusion feature correspondence method for consistent, high-quality video editing that leverages inherent diffusion features, a sliding-window similarity strategy, and token merging to improve efficiency and temporal coherence without additional training.

Contribution

The paper presents a new diffusion feature correspondence approach for video editing, enabling temporal consistency and efficiency without extra training or optimization.

Findings

01

Achieves state-of-the-art performance in various video editing scenarios.

02

Outperforms existing methods both quantitatively and qualitatively.

03

Efficiently reduces GPU memory usage and accelerates editing process.

Abstract

Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video in a zero-shot manner. Despite extensive efforts, maintaining the temporal consistency of edited videos remains challenging due to the lack of temporal constraints in the regular T2I diffusion model. To address this issue, we propose COrrespondence-guided Video Editing (COVE), leveraging the inherent diffusion feature correspondence to achieve high-quality and consistent video editing. Specifically, we propose an efficient sliding-window-based strategy to calculate the similarity among tokens in the diffusion features of source videos, identifying the tokens with high correspondence across frames. During the inversion and denoising process, we sample the tokens in noisy latent based on the correspondence and then perform self-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangjiangshan0725/cove
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies

MethodsSigmoid Activation · Tanh Activation · Location-based Attention · Long Short-Term Memory · Softmax · GloVe Embeddings · Sequence to Sequence · Diffusion · Bidirectional LSTM · Contextual Word Vectors