VideoMV: Consistent Multi-View Generation Based on Large Video   Generative Model

Qi Zuo; Xiaodong Gu; Lingteng Qiu; Yuan Dong; Zhengyi Zhao; Weihao; Yuan; Rui Peng; Siyu Zhu; Zilong Dong; Liefeng Bo; and Qixing Huang

arXiv:2403.12010·cs.CV·March 19, 2024·1 cites

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao, Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, and Qixing Huang

PDF

Open Access

TL;DR

VideoMV introduces a novel multi-view generation framework leveraging video generative models and a 3D-aware sampling strategy, achieving fast, consistent multi-view image synthesis with high quality and reduced training time.

Contribution

The paper proposes a dense multi-view generation model fine-tuned from video generative models and introduces a 3D-aware denoising sampling method to enhance multi-view consistency.

Findings

01

Generates 24 dense views with high consistency.

02

Converges significantly faster (4 GPU hours) than state-of-the-art methods.

03

Outperforms existing approaches in quantitative metrics and visual quality.

Abstract

Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Video Surveillance and Tracking Methods

MethodsDiffusion