ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Yiming Chen; Zhiqi Li; Peidong Liu

arXiv:2311.15561·cs.CV·November 28, 2023·1 cites

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Yiming Chen, Zhiqi Li, Peidong Liu

PDF

Open Access

TL;DR

ET3D introduces a fast, efficient text-to-3D generation method that leverages pre-trained text-to-image models and a 3D GAN, enabling rapid 3D asset creation without 3D training data.

Contribution

The paper presents a novel approach that distills pre-trained text-to-image diffusion models into a 3D GAN for real-time text-to-3D generation.

Findings

01

Generation time is approximately 8 ms per 3D asset.

02

No 3D training data is required for the method.

03

Achieves rapid 3D generation comparable to real-time applications.

Abstract

Recent breakthroughs in text-to-image generation has shown encouraging results via large generative models. Due to the scarcity of 3D assets, it is hardly to transfer the success of text-to-image generation to that of text-to-3D generation. Existing text-to-3D generation methods usually adopt the paradigm of DreamFusion, which conducts per-asset optimization by distilling a pretrained text-to-image diffusion model. The generation speed usually ranges from several minutes to tens of minutes per 3D asset, which degrades the user experience and also imposes a burden to the service providers due to the high computational budget. In this work, we present an efficient text-to-3D generation method, which requires only around 8 $m s$ to generate a 3D asset given the text prompt on a consumer graphic card. The main insight is that we exploit the images generated by a large pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

Methodstravel james · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion