DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

TL;DR
DIRECT-3D introduces a diffusion-based 3D generative model trained on noisy, unaligned data, enabling high-quality, detailed 3D asset creation from text prompts with state-of-the-art results.
Contribution
The paper presents a novel tri-plane diffusion model that automatically filters and aligns noisy 3D data during training, improving large-scale text-to-3D generation.
Findings
Achieves state-of-the-art performance in text-to-3D generation.
Generates high-resolution, realistic 3D objects in seconds.
Can serve as a 3D prior to improve other methods.
Abstract
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i.e., data scarcity) in large-scale 3D generation. In particular, DIRECT-3D is a tri-plane diffusion model that integrates two innovations: 1) A novel learning framework where noisy data are filtered and aligned automatically during the training process. Specifically, after an initial warm-up phase using a small set of clean data, an iterative optimization is introduced in the diffusion process to explicitly estimate the 3D pose of objects and select beneficial data based on conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Human Motion and Animation
MethodsSparse Evolutionary Training · Diffusion
