Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode   Multi-view Latent Diffusion

Xinyang Li; Zhangyu Lai; Linning Xu; Jianfei Guo; Liujuan Cao,; Shengchuan Zhang; Bo Dai; Rongrong Ji

arXiv:2405.09874·cs.CV·May 17, 2024·1 cites

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao,, Shengchuan Zhang, Bo Dai, Rongrong Ji

PDF

Open Access

TL;DR

Dual3D introduces a dual-mode latent diffusion framework for rapid, high-quality text-to-3D asset generation, leveraging pre-trained models and a toggling inference strategy to achieve 10-second generation times.

Contribution

It proposes a novel dual-mode multi-view latent diffusion model with a toggling inference strategy, enabling fast and consistent text-to-3D generation from pre-trained models.

Findings

01

Generates 3D assets in 10 seconds with high quality

02

Achieves state-of-the-art performance in text-to-3D tasks

03

Reduces inference steps by a factor of 10 without quality loss

Abstract

We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only $1$ minute.The key component is a dual-mode multi-view latent diffusion model. Given the noisy multi-view latents, the 2D mode can efficiently denoise them with a single latent denoising network, while the 3D mode can generate a tri-plane neural surface for consistent rendering-based denoising. Most modules for both modes are tuned from a pre-trained text-to-image latent diffusion model to circumvent the expensive cost of training from scratch. To overcome the high rendering cost during inference, we propose the dual-mode toggling inference strategy to use only $1/10$ denoising steps with 3D mode, successfully generating a 3D asset in just $10$ seconds without sacrificing quality. The texture of the 3D asset can be further enhanced by our efficient texture refinement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques

MethodsLatent Diffusion Model · Diffusion