Hierarchical Masked 3D Diffusion Model for Video Outpainting

Fanda Fan; Chaoxu Guo; Litong Gong; Biao Wang; Tiezheng Ge; Yuning; Jiang; Chunjie Luo; Jianfeng Zhan

arXiv:2309.02119·cs.CV·January 22, 2024

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning, Jiang, Chunjie Luo, Jianfeng Zhan

PDF

Open Access 2 Models

TL;DR

This paper introduces a hierarchical masked 3D diffusion model for video outpainting that maintains temporal consistency and reduces artifacts, achieving state-of-the-art results in filling missing video frame areas.

Contribution

The paper proposes a novel masked 3D diffusion approach with hybrid coarse-to-fine inference and cross-attention guidance for improved video outpainting.

Findings

01

Achieves state-of-the-art performance on video outpainting benchmarks.

02

Effectively maintains temporal consistency across frames.

03

Reduces artifacts with hybrid infilling and interpolation strategies.

Abstract

Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques

MethodsContrastive Language-Image Pre-training · Diffusion