Hierarchical Masked 3D Diffusion Model for Video Outpainting
Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning, Jiang, Chunjie Luo, Jianfeng Zhan

TL;DR
This paper introduces a hierarchical masked 3D diffusion model for video outpainting that maintains temporal consistency and reduces artifacts, achieving state-of-the-art results in filling missing video frame areas.
Contribution
The paper proposes a novel masked 3D diffusion approach with hybrid coarse-to-fine inference and cross-attention guidance for improved video outpainting.
Findings
Achieves state-of-the-art performance on video outpainting benchmarks.
Effectively maintains temporal consistency across frames.
Reduces artifacts with hybrid infilling and interpolation strategies.
Abstract
Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
MethodsContrastive Language-Image Pre-training · Diffusion
