Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao,, Shengchuan Zhang, Bo Dai, Rongrong Ji

TL;DR
Dual3D introduces a dual-mode latent diffusion framework for rapid, high-quality text-to-3D asset generation, leveraging pre-trained models and a toggling inference strategy to achieve 10-second generation times.
Contribution
It proposes a novel dual-mode multi-view latent diffusion model with a toggling inference strategy, enabling fast and consistent text-to-3D generation from pre-trained models.
Findings
Generates 3D assets in 10 seconds with high quality
Achieves state-of-the-art performance in text-to-3D tasks
Reduces inference steps by a factor of 10 without quality loss
Abstract
We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only minute.The key component is a dual-mode multi-view latent diffusion model. Given the noisy multi-view latents, the 2D mode can efficiently denoise them with a single latent denoising network, while the 3D mode can generate a tri-plane neural surface for consistent rendering-based denoising. Most modules for both modes are tuned from a pre-trained text-to-image latent diffusion model to circumvent the expensive cost of training from scratch. To overcome the high rendering cost during inference, we propose the dual-mode toggling inference strategy to use only denoising steps with 3D mode, successfully generating a 3D asset in just seconds without sacrificing quality. The texture of the 3D asset can be further enhanced by our efficient texture refinement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsLatent Diffusion Model · Diffusion
