Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Wen Wang; Yan Jiang; Kangyang Xie; Zide Liu; Hao Chen; Yue Cao,; Xinlong Wang; Chunhua Shen

arXiv:2303.17599·cs.CV·January 5, 2024·23 cites

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao,, Xinlong Wang, Chunhua Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces vid2vid-zero, a zero-shot video editing method that uses pre-existing image diffusion models without training on videos, achieving consistent and high-quality edits in real-world videos.

Contribution

It presents a novel zero-shot video editing approach leveraging off-the-shelf image diffusion models with modules for text-video alignment, temporal consistency, and fidelity, without any video training.

Findings

01

Effective zero-shot editing of real-world videos.

02

Maintains temporal consistency across frames.

03

Enables editing of attributes, subjects, and scenes.

Abstract

Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baaivision/vid2vid-zero
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsTest · Diffusion