TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt

Jiahui Yang; Donglin Di; Baorui Ma; Xun Yang; Yongjia Ma; Wenzhang; Sun; Wei Chen; Jianxun Cui; Zhou Xue; Meng Wang; Yebin Liu

arXiv:2410.21299·cs.CV·November 1, 2024

TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt

Jiahui Yang, Donglin Di, Baorui Ma, Xun Yang, Yongjia Ma, Wenzhang, Sun, Wei Chen, Jianxun Cui, Zhou Xue, Meng Wang, Yebin Liu

PDF

Open Access

TL;DR

This paper introduces TV-3DG, a novel method for text-to-3D generation that effectively incorporates visual prompts and improves quality by addressing limitations of existing score distillation sampling techniques.

Contribution

We propose Classifier Score Matching (CSM) to replace SDS, enabling better multi-condition customization, and integrate visual prompts with attention and calibration modules for high-quality 3D generation.

Findings

01

CSM reduces noise and improves stability in 3D generation.

02

Visual prompts enhance customization and detail in generated 3D models.

03

Experiments show TV-3DG outperforms previous methods in quality and stability.

Abstract

In recent years, advancements in generative models have significantly expanded the capabilities of text-to-3D generation. Many approaches rely on Score Distillation Sampling (SDS) technology. However, SDS struggles to accommodate multi-condition inputs, such as text and visual prompts, in customized generation tasks. To explore the core reasons, we decompose SDS into a difference term and a classifier-free guidance term. Our analysis identifies the core issue as arising from the difference term and the random noise addition during the optimization process, both contributing to deviations from the target mode during distillation. To address this, we propose a novel algorithm, Classifier Score Matching (CSM), which removes the difference term in SDS and uses a deterministic noise addition process to reduce noise during optimization, effectively overcoming the low-quality limitations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsSoftmax · Attention Is All You Need