ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Yicheng Zhong; Huawei Wei; Peiji Yang; Zhisheng Wang

arXiv:2308.14448·cs.CV·September 12, 2023

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Yicheng Zhong, Huawei Wei, Peiji Yang, Zhisheng Wang

PDF

Open Access 1 Video

TL;DR

ExpCLIP introduces a flexible, language-driven approach for controlling facial expressions in speech-driven animation by aligning text prompts with facial styles using a novel dataset and CLIP-based model.

Contribution

The paper presents TEAD, an automatic annotation method with LLM support, and ExpCLIP, a CLIP-based model for semantic alignment of text and facial expressions, enabling style control in animations.

Findings

01

Supports arbitrary style control via natural language prompts.

02

Achieves semantically aligned text and facial expression embeddings.

03

Enhances diversity and expressiveness in facial animation styles.

Abstract

The objective of stylized speech-driven facial animation is to create animations that encapsulate specific emotional expressions. Existing methods often depend on pre-established emotional labels or facial expression templates, which may limit the necessary flexibility for accurately conveying user intent. In this research, we introduce a technique that enables the control of arbitrary styles by leveraging natural language as emotion prompts. This technique presents benefits in terms of both flexibility and user-friendliness. To realize this objective, we initially construct a Text-Expression Alignment Dataset (TEAD), wherein each facial expression is paired with several prompt-like descriptions.We propose an innovative automatic annotation method, supported by Large Language Models (LLMs), to expedite the dataset construction, thereby eliminating the substantial expense of manual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment· underline

Taxonomy

TopicsHuman Motion and Animation · Face recognition and analysis