Text-driven 3D Human Generation via Contrastive Preference Optimization

Pengfei Zhou; Xukun Shen; Yong Hu

arXiv:2502.08977·cs.CV·March 26, 2025

Text-driven 3D Human Generation via Contrastive Preference Optimization

Pengfei Zhou, Xukun Shen, Yong Hu

PDF

Open Access

TL;DR

This paper introduces a contrastive preference optimization framework that enhances 3D human generation from text by improving alignment and realism, especially for complex descriptions, through preference-guided score distillation sampling.

Contribution

The paper proposes a novel preference optimization module with negation preferences to better align 3D models with complex textual inputs, addressing limitations of existing SDS methods.

Findings

01

Achieves state-of-the-art alignment accuracy.

02

Improves texture realism and visual fidelity.

03

Effectively handles long and complex textual descriptions.

Abstract

Recent advances in Score Distillation Sampling (SDS) have improved 3D human generation from textual descriptions. However, existing methods still face challenges in accurately aligning 3D models with long and complex textual inputs. To address this challenge, we propose a novel framework that introduces contrastive preferences, where human-level preference models, guided by both positive and negative prompts, assist SDS for improved alignment. Specifically, we design a preference optimization module that integrates multiple models to comprehensively capture the full range of textual features. Furthermore, we introduce a negation preference module to mitigate over-optimization of irrelevant details by leveraging static-dynamic negation prompts, effectively preventing ``reward hacking". Extensive experiments demonstrate that our method achieves state-of-the-art results, significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Video Analysis and Summarization