Consistent Video Editing as Flow-Driven Image-to-Video Generation

Ge Wang; Songlin Fan; Hangxu Liu; Quanjian Song; Hewei Wang; Jinfeng Xu

arXiv:2506.07713·cs.CV·June 16, 2025

Consistent Video Editing as Flow-Driven Image-to-Video Generation

Ge Wang, Songlin Fan, Hangxu Liu, Quanjian Song, Hewei Wang, Jinfeng Xu

PDF

Open Access

TL;DR

This paper introduces FlowV2V, a flow-driven image-to-video generation method that improves complex motion modeling and temporal consistency in video editing, especially for non-rigid objects, outperforming existing methods.

Contribution

FlowV2V redefines video editing as flow-driven I2V generation, effectively modeling complex motions and ensuring temporal consistency through flow alignment and first-frame editing.

Findings

01

Achieves 13.67% improvement on DOVER metric

02

Reduces warping error by 50.66%

03

Outperforms state-of-the-art methods in temporal consistency

Abstract

With the prosper of video diffusion models, down-stream applications like video editing have been significantly promoted without consuming much computational cost. One particular challenge in this task lies at the motion transfer process from the source video to the edited one, where it requires the consideration of the shape deformation in between, meanwhile maintaining the temporal consistency in the generated video sequence. However, existing methods fail to model complicated motion patterns for video editing, and are fundamentally limited to object replacement, where tasks with non-rigid object motions like multi-object and portrait editing are largely neglected. In this paper, we observe that optical flows offer a promising alternative in complex motion modeling, and present FlowV2V to re-investigate video editing as a task of flow-driven Image-to-Video (I2V) generation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis