Re-Attentional Controllable Video Diffusion Editing

Yuanzhi Wang; Yong Li; Mengyi Liu; Xiaoya Zhang; Xin Liu; Zhen Cui,; Antoni B. Chan

arXiv:2412.11710·cs.CV·December 17, 2024

Re-Attentional Controllable Video Diffusion Editing

Yuanzhi Wang, Yong Li, Mengyi Liu, Xiaoya Zhang, Xin Liu, Zhen Cui,, Antoni B. Chan

PDF

Open Access 2 Repos

TL;DR

This paper introduces ReAtCo, a novel method for controllable video editing using diffusion models, which improves spatial alignment and preserves invariant regions, resulting in more accurate and high-fidelity edited videos.

Contribution

The paper proposes Re-Attentional Diffusion and Invariant Region-guided Joint Sampling strategies to enhance controllability and fidelity in text-guided video diffusion editing without additional training.

Findings

01

ReAtCo improves spatial alignment of edited objects.

02

ReAtCo reduces border artifacts in invariant regions.

03

ReAtCo achieves superior editing performance compared to existing methods.

Abstract

Editing videos with textual guidance has garnered popularity due to its streamlined process which mandates users to solely edit the text prompt corresponding to the source video. Recent studies have explored and exploited large-scale text-to-image diffusion models for text-guided video editing, resulting in remarkable video editing capabilities. However, they may still suffer from some limitations such as mislocated objects, incorrect number of objects. Therefore, the controllability of video editing remains a formidable challenge. In this paper, we aim to challenge the above limitations by proposing a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method. Specially, to align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD) to refocus the cross-attention activation responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Multimedia Communication and Technology · Video Analysis and Summarization

MethodsDiffusion · ALIGN