Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation   Using only Images

Cuican Yu; Guansong Lu; Yihan Zeng; Jian Sun; Xiaodan Liang; Huibin; Li; Zongben Xu; Songcen Xu; Wei Zhang; Hang Xu

arXiv:2308.16758·cs.CV·September 1, 2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin, Li, Zongben Xu, Songcen Xu, Wei Zhang, Hang Xu

PDF

Open Access 1 Video

TL;DR

This paper introduces TG-3DFace, a novel method for generating realistic 3D faces from text descriptions using only 2D face data, with techniques to ensure semantic consistency and high-quality outputs.

Contribution

The paper proposes a text-guided 3D face generation framework that learns from 2D face data and introduces cross-modal alignment and classifier guidance for improved realism and diversity.

Findings

01

Boosts 9% multi-view consistency over Latent3D

02

Achieves higher FID and CLIP scores than 2D face/image models

03

Generates more realistic and semantically consistent 3D faces

Abstract

Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie, and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images· youtube

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsContrastive Learning · Contrastive Language-Image Pre-training