Turbo3D: Ultra-fast Text-to-3D Generation

Hanzhe Hu; Tianwei Yin; Fujun Luan; Yiwei Hu; Hao Tan; Zexiang Xu; Sai; Bi; Shubham Tulsiani; Kai Zhang

arXiv:2412.04470·cs.CV·December 6, 2024

Turbo3D: Ultra-fast Text-to-3D Generation

Hanzhe Hu, Tianwei Yin, Fujun Luan, Yiwei Hu, Hao Tan, Zexiang Xu, Sai, Bi, Shubham Tulsiani, Kai Zhang

PDF

Open Access

TL;DR

Turbo3D is a novel system that enables ultra-fast, high-quality text-to-3D asset generation in under one second by leveraging a multi-view diffusion generator and a latent space Gaussian reconstructor.

Contribution

It introduces a 4-step, 4-view diffusion generator distilled via a Dual-Teacher approach and shifts Gaussian reconstruction to latent space for efficiency, achieving state-of-the-art speed and quality.

Findings

01

Generates 3D assets in under one second.

02

Outperforms previous methods in quality and speed.

03

Uses a novel Dual-Teacher distillation for view consistency.

Abstract

We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Human Motion and Animation · Handwritten Text Recognition Techniques

MethodsDiffusion