OrientDream: Streamlining Text-to-3D Generation with Explicit   Orientation Control

Yuzhong Huang; Zhong Li; Zhang Chen; Zhiyuan Ren; Guosheng Lin; Fred; Morstatter; Yi Xu

arXiv:2406.10000·cs.CV·June 17, 2024

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred, Morstatter, Yi Xu

PDF

Open Access

TL;DR

OrientDream introduces an efficient, camera orientation conditioned framework for text-to-3D generation that improves fidelity, multi-view consistency, and optimization speed by leveraging explicit orientation features and external multi-view data.

Contribution

It presents a novel orientation conditioned approach that enhances multi-view consistency and accelerates optimization in text-to-3D generation.

Findings

01

Produces high-quality, multi-view consistent NeRF models

02

Achieves faster optimization compared to existing methods

03

Utilizes external multi-view dataset to improve diffusion model performance

Abstract

In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion