Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image   Diffusion Models

Hyeonho Jeong; Jong Chul Ye

arXiv:2310.01107·cs.CV·February 27, 2024·5 cites

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

Hyeonho Jeong, Jong Chul Ye

PDF

Open Access 1 Repo

TL;DR

Ground-A-Video is a training-free, zero-shot framework for multi-attribute, temporally consistent video editing guided by grounding information, outperforming baselines in accuracy and consistency.

Contribution

It introduces a novel grounding-guided, training-free video editing method with Cross-Frame Gated Attention for multi-attribute editing.

Findings

01

Outperforms baseline methods in edit-accuracy

02

Achieves superior frame consistency

03

Operates in a zero-shot, training-free manner

Abstract

Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying the wrong elements of the input video, and failing to preserve regions of the input video that should remain intact. To address this, here we present a novel grounding-guided video-to-video translation framework called Ground-A-Video for multi-attribute video editing. Ground-A-Video attains temporally consistent multi-attribute editing of input videos in a training-free manner without aforementioned shortcomings. Central to our method is the introduction of Cross-Frame Gated Attention which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ground-a-video/ground-a-video
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications