Turbo3D: Ultra-fast Text-to-3D Generation
Hanzhe Hu, Tianwei Yin, Fujun Luan, Yiwei Hu, Hao Tan, Zexiang Xu, Sai, Bi, Shubham Tulsiani, Kai Zhang

TL;DR
Turbo3D is a novel system that enables ultra-fast, high-quality text-to-3D asset generation in under one second by leveraging a multi-view diffusion generator and a latent space Gaussian reconstructor.
Contribution
It introduces a 4-step, 4-view diffusion generator distilled via a Dual-Teacher approach and shifts Gaussian reconstruction to latent space for efficiency, achieving state-of-the-art speed and quality.
Findings
Generates 3D assets in under one second.
Outperforms previous methods in quality and speed.
Uses a novel Dual-Teacher distillation for view consistency.
Abstract
We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Human Motion and Animation · Handwritten Text Recognition Techniques
MethodsDiffusion
