OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control
Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred, Morstatter, Yi Xu

TL;DR
OrientDream introduces an efficient, camera orientation conditioned framework for text-to-3D generation that improves fidelity, multi-view consistency, and optimization speed by leveraging explicit orientation features and external multi-view data.
Contribution
It presents a novel orientation conditioned approach that enhances multi-view consistency and accelerates optimization in text-to-3D generation.
Findings
Produces high-quality, multi-view consistent NeRF models
Achieves faster optimization compared to existing methods
Utilizes external multi-view dataset to improve diffusion model performance
Abstract
In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
