ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View   Synthesis

Wangbo Yu; Jinbo Xing; Li Yuan; Wenbo Hu; Xiaoyu Li; Zhipeng Huang,; Xiangjun Gao; Tien-Tsin Wong; Ying Shan; Yonghong Tian

arXiv:2409.02048·cs.CV·September 4, 2024·3 cites

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang,, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, Yonghong Tian

PDF

Open Access 1 Repo

TL;DR

ViewCrafter introduces a novel approach combining video diffusion models and point-based 3D clues to synthesize high-fidelity, consistent novel views from sparse images, enabling immersive and scene-level text-to-3D applications.

Contribution

The paper presents a new method that leverages video diffusion models with iterative view synthesis and camera planning to improve novel view synthesis from limited input images.

Findings

01

Demonstrates strong generalization across diverse datasets.

02

Achieves high-fidelity and consistent novel view synthesis.

03

Enables real-time rendering and scene-level text-to-3D generation.

Abstract

Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. In this work, we propose \textbf{ViewCrafter}, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images with the prior of video diffusion model. Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames with precise camera pose control. To further enlarge the generation range of novel views, we tailored an iterative view synthesis strategy together with a camera trajectory planning algorithm to progressively extend the 3D clues and the areas covered by the novel views. With ViewCrafter, we can facilitate various applications, such as immersive experiences with real-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drexubery/viewcrafter
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Advanced Vision and Imaging · Advanced Image Processing Techniques

MethodsDiffusion