X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan, Jiang, Annan Shu, Rongrong Ji

TL;DR
X-Dreamer introduces a novel method that bridges the domain gap between 2D and 3D generation, improving the quality and accuracy of text-to-3D content by incorporating camera guidance and attention-mask alignment.
Contribution
The paper proposes two innovative components, CG-LoRA and AMA loss, to enhance 3D content creation by effectively aligning 2D diffusion models with 3D representations.
Findings
Outperforms existing text-to-3D methods in quality and accuracy.
Effectively incorporates camera information into diffusion models.
Focuses on foreground object detail and alignment.
Abstract
In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects. Consequently, employing 2D diffusion models directly for optimizing 3D representations may lead to suboptimal outcomes. To address this issue, we present X-Dreamer, a novel approach for high-quality text-to-3D content creation that effectively bridges the gap between text-to-2D and text-to-3D synthesis. The key components of X-Dreamer are two innovative designs:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques
MethodsDiffusion
