Video Diffusion Models are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin, Jungho Lee, Sunghun Yang,, Sangyoun Lee

TL;DR
This paper introduces FFF-VDI, a novel video inpainting model that leverages pre-trained image-to-video diffusion models to produce more natural and temporally consistent videos, overcoming optical flow limitations.
Contribution
The paper presents the first integration of image-to-video diffusion models into video inpainting, enhancing quality and temporal consistency over existing propagation-based methods.
Findings
Outperforms optical flow-based methods in quality and consistency
Robustly handles diverse inpainting scenarios
Produces more natural and temporally coherent videos
Abstract
Propagation-based video inpainting using optical flow at the pixel or feature level has recently garnered significant attention. However, it has limitations such as the inaccuracy of optical flow prediction and the propagation of noise over time. These issues result in non-uniform noise and time consistency problems throughout the video, which are particularly pronounced when the removed area is large and involves substantial movement. To address these issues, we propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI). We design FFF-VDI inspired by the capabilities of pre-trained image-to-video diffusion models that can transform the first frame image into a highly natural video. To apply this to the video inpainting task, we propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code. Next, we fine-tune…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsDiffusion · Inpainting
