DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing

Lingling Cai; Kang Zhao; Hangjie Yuan; Xiang Wang; Yingya Zhang; Kejie Huang

arXiv:2506.20967·cs.CV·June 30, 2025

DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing

Lingling Cai, Kang Zhao, Hangjie Yuan, Xiang Wang, Yingya Zhang, Kejie Huang

PDF

Open Access

TL;DR

DFVEdit is a fast, zero-shot video editing method for Video Diffusion Transformers that operates on latents via flow transformation, significantly reducing computational costs while maintaining high editing quality.

Contribution

It introduces the Conditional Delta Flow Vector (CDFV), enabling efficient, unbiased flow-based editing without attention modification or fine-tuning.

Findings

01

At least 20x inference speed-up compared to existing methods.

02

85% memory reduction during editing.

03

State-of-the-art performance on structural fidelity and consistency.

Abstract

The advent of Video Diffusion Transformers (Video DiTs) marks a milestone in video generation. However, directly applying existing video editing methods to Video DiTs often incurs substantial computational overhead, due to resource-intensive attention modification or finetuning. To alleviate this problem, we present DFVEdit, an efficient zero-shot video editing method tailored for Video DiTs. DFVEdit eliminates the need for both attention modification and fine-tuning by directly operating on clean latents via flow transformation. To be more specific, we observe that editing and sampling can be unified under the continuous flow perspective. Building upon this foundation, we propose the Conditional Delta Flow Vector (CDFV) -- a theoretically unbiased estimation of DFV -- and integrate Implicit Cross Attention (ICA) guidance as well as Embedding Reinforcement (ER) to further enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Visual Attention and Saliency Detection

MethodsDiffusion