Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu,, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

TL;DR
Instant3D introduces a fast, feed-forward approach to generate high-quality, diverse 3D assets from text prompts by combining sparse-view image generation with a transformer-based NeRF reconstruction, significantly reducing inference time.
Contribution
The paper presents a novel two-stage method that efficiently produces high-quality 3D models from text, overcoming limitations of previous slow or low-quality approaches.
Findings
Generates diverse 3D assets in 20 seconds
Achieves high visual quality comparable to optimization-based methods
Outperforms prior methods in speed by two orders of magnitude
Abstract
Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
MethodsSparse Evolutionary Training · Diffusion
