Edit Temporal-Consistent Videos with Image Diffusion Model

Yuanzhi Wang; Yong Li; Xiaoya Zhang; Xin Liu; Anbo Dai; Antoni B.; Chan; Zhen Cui

arXiv:2308.09091·cs.CV·August 29, 2024·1 cites

Edit Temporal-Consistent Videos with Image Diffusion Model

Yuanzhi Wang, Yong Li, Xiaoya Zhang, Xin Liu, Anbo Dai, Antoni B., Chan, Zhen Cui

PDF

Open Access 1 Repo

TL;DR

This paper introduces TCVE, a novel method that combines spatial and temporal Unets to improve temporal consistency in text-guided video editing, achieving state-of-the-art results.

Contribution

The paper proposes a new temporal Unet architecture and a spatial-temporal modeling unit to enhance temporal coherence in video editing using diffusion models.

Findings

01

TCVE outperforms existing methods in temporal consistency.

02

The approach maintains high-quality content manipulation.

03

Quantitative results show state-of-the-art performance.

Abstract

Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing, yielding impressive zero-shot video editing performance. Nonetheless, the generated videos usually show spatial irregularities and temporal inconsistencies as the temporal characteristics of videos have not been faithfully modeled. In this paper, we propose an elegant yet effective Temporal-Consistent Video Editing (TCVE) method to mitigate the temporal inconsistency challenge for robust text-guided video editing. In addition to the utilization of a pretrained T2I 2D Unet for spatial content manipulation, we establish a dedicated temporal Unet architecture to faithfully capture the temporal coherence of the input video sequences. Furthermore, to establish coherence and interrelation between the spatial-focused and temporal-focused components, a cohesive spatial-temporal modeling unit is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdswyz/TCVE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging

MethodsDiffusion