VideoDirector: Precise Video Editing via Text-to-Video Models

Yukun Wang; Longguang Wang; Zhiyuan Ma; Qibin Hu; Kai Xu; Yulan Guo

arXiv:2411.17592·cs.CV·March 20, 2025

VideoDirector: Precise Video Editing via Text-to-Video Models

Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo

PDF

Open Access

TL;DR

VideoDirector introduces a novel method for precise video editing using text-to-video models, overcoming artifacts and content distortion by decoupling spatial-temporal information and controlling attention, achieving state-of-the-art results.

Contribution

The paper proposes spatial-temporal decoupled guidance and multi-frame null-text optimization for improved inversion and editing in T2V models, addressing key limitations of existing methods.

Findings

01

Effective disentanglement of spatial-temporal information

02

Enhanced fidelity and content preservation in edited videos

03

State-of-the-art accuracy and motion smoothness

Abstract

Despite the typical inversion-then-editing paradigm using text-to-image (T2I) models has demonstrated promising results, directly extending it to text-to-video (T2V) models still suffers severe artifacts such as color flickering and content distortion. Consequently, current video editing methods primarily rely on T2I models, which inherently lack temporal-coherence generative ability, often resulting in inferior editing results. In this paper, we attribute the failure of the typical editing paradigm to: 1) Tightly Spatial-temporal Coupling. The vanilla pivotal-based inversion strategy struggles to disentangle spatial-temporal information in the video diffusion model; 2) Complicated Spatial-temporal Layout. The vanilla cross-attention control is deficient in preserving the unedited content. To address these limitations, we propose a spatial-temporal decoupled guidance (STDG) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Video Analysis and Summarization

MethodsDiffusion