VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo

TL;DR
VideoDirector introduces a novel method for precise video editing using text-to-video models, overcoming artifacts and content distortion by decoupling spatial-temporal information and controlling attention, achieving state-of-the-art results.
Contribution
The paper proposes spatial-temporal decoupled guidance and multi-frame null-text optimization for improved inversion and editing in T2V models, addressing key limitations of existing methods.
Findings
Effective disentanglement of spatial-temporal information
Enhanced fidelity and content preservation in edited videos
State-of-the-art accuracy and motion smoothness
Abstract
Despite the typical inversion-then-editing paradigm using text-to-image (T2I) models has demonstrated promising results, directly extending it to text-to-video (T2V) models still suffers severe artifacts such as color flickering and content distortion. Consequently, current video editing methods primarily rely on T2I models, which inherently lack temporal-coherence generative ability, often resulting in inferior editing results. In this paper, we attribute the failure of the typical editing paradigm to: 1) Tightly Spatial-temporal Coupling. The vanilla pivotal-based inversion strategy struggles to disentangle spatial-temporal information in the video diffusion model; 2) Complicated Spatial-temporal Layout. The vanilla cross-attention control is deficient in preserving the unedited content. To address these limitations, we propose a spatial-temporal decoupled guidance (STDG) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Video Analysis and Summarization
MethodsDiffusion
