VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr

TL;DR
VFusion3D introduces a scalable 3D generative model trained on synthetic multi-view data derived from pre-trained video diffusion models, enabling rapid 3D asset creation from a single image with superior quality.
Contribution
The paper proposes leveraging pre-trained video diffusion models to generate large-scale synthetic multi-view data for training 3D generative models, addressing data scarcity.
Findings
VFusion3D trained on nearly 3 million synthetic views.
Generates 3D assets from a single image in seconds.
Outperforms current state-of-the-art 3D generative models.
Abstract
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Music and Audio Processing
MethodsDiffusion
