Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching

Weimin Bai; Yubo Li; Wenzheng Chen; Weijian Luo; He Sun

arXiv:2506.13594·cs.CV·June 17, 2025

Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching

Weimin Bai, Yubo Li, Wenzheng Chen, Weijian Luo, He Sun

PDF

Open Access

TL;DR

Dive3D introduces a new framework for text-to-3D generation that replaces traditional KL-based loss with Score Implicit Matching, significantly enhancing diversity, fidelity, and alignment in generated 3D assets.

Contribution

It proposes Score Implicit Matching loss and a unified diffusion distillation and reward-guided optimization approach, improving diversity and quality over existing methods.

Findings

01

Outperforms prior methods in diversity and visual fidelity

02

Achieves higher scores on text-asset alignment and plausibility

03

Demonstrates robustness across various prompts and benchmarks

Abstract

Distilling pre-trained 2D diffusion models into 3D assets has driven remarkable advances in text-to-3D synthesis. However, existing methods typically rely on Score Distillation Sampling (SDS) loss, which involves asymmetric KL divergence--a formulation that inherently favors mode-seeking behavior and limits generation diversity. In this paper, we introduce Dive3D, a novel text-to-3D generation framework that replaces KL-based objectives with Score Implicit Matching (SIM) loss, a score-based objective that effectively mitigates mode collapse. Furthermore, Dive3D integrates both diffusion distillation and reward-guided optimization under a unified divergence perspective. Such reformulation, together with SIM loss, yields significantly more diverse 3D outputs while improving text alignment, human preference, and overall visual fidelity. We validate Dive3D across various 2D-to-3D prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling

MethodsDiffusion