Consistent Video Editing as Flow-Driven Image-to-Video Generation
Ge Wang, Songlin Fan, Hangxu Liu, Quanjian Song, Hewei Wang, Jinfeng Xu

TL;DR
This paper introduces FlowV2V, a flow-driven image-to-video generation method that improves complex motion modeling and temporal consistency in video editing, especially for non-rigid objects, outperforming existing methods.
Contribution
FlowV2V redefines video editing as flow-driven I2V generation, effectively modeling complex motions and ensuring temporal consistency through flow alignment and first-frame editing.
Findings
Achieves 13.67% improvement on DOVER metric
Reduces warping error by 50.66%
Outperforms state-of-the-art methods in temporal consistency
Abstract
With the prosper of video diffusion models, down-stream applications like video editing have been significantly promoted without consuming much computational cost. One particular challenge in this task lies at the motion transfer process from the source video to the edited one, where it requires the consideration of the shape deformation in between, meanwhile maintaining the temporal consistency in the generated video sequence. However, existing methods fail to model complicated motion patterns for video editing, and are fundamentally limited to object replacement, where tasks with non-rigid object motions like multi-object and portrait editing are largely neglected. In this paper, we observe that optical flows offer a promising alternative in complex motion modeling, and present FlowV2V to re-investigate video editing as a task of flow-driven Image-to-Video (I2V) generation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
