Instant3D: Fast Text-to-3D with Sparse-View Generation and Large   Reconstruction Model

Jiahao Li; Hao Tan; Kai Zhang; Zexiang Xu; Fujun Luan; Yinghao Xu,; Yicong Hong; Kalyan Sunkavalli; Greg Shakhnarovich; Sai Bi

arXiv:2311.06214·cs.CV·November 27, 2023·32 cites

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu,, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

PDF

Open Access 1 Repo

TL;DR

Instant3D introduces a fast, feed-forward approach to generate high-quality, diverse 3D assets from text prompts by combining sparse-view image generation with a transformer-based NeRF reconstruction, significantly reducing inference time.

Contribution

The paper presents a novel two-stage method that efficiently produces high-quality 3D models from text, overcoming limitations of previous slow or low-quality approaches.

Findings

01

Generates diverse 3D assets in 20 seconds

02

Achieves high visual quality comparable to optimization-based methods

03

Outperforms prior methods in speed by two orders of magnitude

Abstract

Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenguolin/DiffSplat
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques

MethodsSparse Evolutionary Training · Diffusion