JointDreamer: Ensuring Geometry Consistency and Text Congruence in   Text-to-3D Generation via Joint Score Distillation

Chenhan Jiang; Yihan Zeng; Tianyang Hu; Songcun Xu; Wei; Zhang; Hang Xu; Dit-Yan Yeung

arXiv:2407.12291·cs.CV·October 15, 2024

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei, Zhang, Hang Xu, Dit-Yan Yeung

PDF

Open Access

TL;DR

JointDreamer introduces a joint score distillation method that improves 3D consistency and text alignment in text-to-3D generation by modeling the coherence among multiple views, surpassing previous view-independent approaches.

Contribution

The paper proposes JointScoreDistillation, a novel framework that enforces view coherence in text-to-3D generation, significantly reducing inconsistency while maintaining text fidelity.

Findings

01

Reduces 3D inconsistency in generated models

02

Achieves 88.5% CLIP R-Precision in text alignment

03

Improves geometric and texture fidelity

Abstract

Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \textbf{S}core \textbf{D}istillation (JSD), a new paradigm that ensures coherent 3D generations. Specifically, we model the joint image distribution, which introduces an energy function to capture the coherence among denoised images from the diffusion model. We then derive the joint score distillation on multiple rendered views of the 3D representation, as opposed to a single view in SDS. In addition, we instantiate three universal view-aware models as energy functions, demonstrating compatibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training · Diffusion