VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion   Models

Junlin Han; Filippos Kokkinos; Philip Torr

arXiv:2403.12034·cs.CV·July 22, 2024·1 cites

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Junlin Han, Filippos Kokkinos, Philip Torr

PDF

Open Access 2 Models

TL;DR

VFusion3D introduces a scalable 3D generative model trained on synthetic multi-view data derived from pre-trained video diffusion models, enabling rapid 3D asset creation from a single image with superior quality.

Contribution

The paper proposes leveraging pre-trained video diffusion models to generate large-scale synthetic multi-view data for training 3D generative models, addressing data scarcity.

Findings

01

VFusion3D trained on nearly 3 million synthetic views.

02

Generates 3D assets from a single image in seconds.

03

Outperforms current state-of-the-art 3D generative models.

Abstract

This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Music and Audio Processing

MethodsDiffusion