FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan, Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana, Marculescu

TL;DR
FlowVid introduces a flexible, efficient, and high-quality video-to-video synthesis method that maintains temporal consistency by leveraging imperfect optical flow and editing the first frame with existing image-to-image models.
Contribution
The paper presents FlowVid, a novel V2V synthesis framework that effectively handles optical flow imperfections and enables seamless editing of the first frame for consistent video generation.
Findings
FlowVid is 3.1x to 10.5x faster than prior methods.
It outperforms existing models in user preference studies.
FlowVid supports various modifications like stylization and object swaps.
Abstract
Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos. However, the advancement of video-to-video (V2V) synthesis has been hampered by the challenge of maintaining temporal consistency across video frames. This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video. Contrary to prior methods that strictly adhere to optical flow, our approach harnesses its benefits while handling the imperfection in flow estimation. We encode the optical flow via warping from the first frame and serve it as a supplementary reference in the diffusion model. This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames. Our V2V model, FlowVid, demonstrates remarkable properties:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Computer Graphics and Visualization Techniques
MethodsDiffusion
