TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt
Jiahui Yang, Donglin Di, Baorui Ma, Xun Yang, Yongjia Ma, Wenzhang, Sun, Wei Chen, Jianxun Cui, Zhou Xue, Meng Wang, Yebin Liu

TL;DR
This paper introduces TV-3DG, a novel method for text-to-3D generation that effectively incorporates visual prompts and improves quality by addressing limitations of existing score distillation sampling techniques.
Contribution
We propose Classifier Score Matching (CSM) to replace SDS, enabling better multi-condition customization, and integrate visual prompts with attention and calibration modules for high-quality 3D generation.
Findings
CSM reduces noise and improves stability in 3D generation.
Visual prompts enhance customization and detail in generated 3D models.
Experiments show TV-3DG outperforms previous methods in quality and stability.
Abstract
In recent years, advancements in generative models have significantly expanded the capabilities of text-to-3D generation. Many approaches rely on Score Distillation Sampling (SDS) technology. However, SDS struggles to accommodate multi-condition inputs, such as text and visual prompts, in customized generation tasks. To explore the core reasons, we decompose SDS into a difference term and a classifier-free guidance term. Our analysis identifies the core issue as arising from the difference term and the random noise addition during the optimization process, both contributing to deviations from the target mode during distillation. To address this, we propose a novel algorithm, Classifier Score Matching (CSM), which removes the difference term in SDS and uses a deterministic noise addition process to reduce noise during optimization, effectively overcoming the low-quality limitations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation
MethodsSoftmax · Attention Is All You Need
