JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei, Zhang, Hang Xu, Dit-Yan Yeung

TL;DR
JointDreamer introduces a joint score distillation method that improves 3D consistency and text alignment in text-to-3D generation by modeling the coherence among multiple views, surpassing previous view-independent approaches.
Contribution
The paper proposes JointScoreDistillation, a novel framework that enforces view coherence in text-to-3D generation, significantly reducing inconsistency while maintaining text fidelity.
Findings
Reduces 3D inconsistency in generated models
Achieves 88.5% CLIP R-Precision in text alignment
Improves geometric and texture fidelity
Abstract
Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \textbf{S}core \textbf{D}istillation (JSD), a new paradigm that ensures coherent 3D generations. Specifically, we model the joint image distribution, which introduces an energy function to capture the coherence among denoised images from the diffusion model. We then derive the joint score distillation on multiple rendered views of the 3D representation, as opposed to a single view in SDS. In addition, we instantiate three universal view-aware models as energy functions, demonstrating compatibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsContrastive Language-Image Pre-training · Diffusion
