SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation

Changpeng Cai; Guinan Guo; Jiao Li; Junhao Su; Fei Shen; Chenghao He,; Jing Xiao; Yuanxu Chen; Lei Dai; Feiyu Zhu

arXiv:2405.07257·cs.CV·November 5, 2024

SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation

Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Fei Shen, Chenghao He,, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

PDF

Open Access

TL;DR

SPEAK is a novel one-shot talking head generation framework that enables independent control of lip synchronization, facial emotions, and head poses, producing realistic and expressive facial animations from speech.

Contribution

The paper introduces IRFD for feature disentanglement and a face editing module for emotional and pose control, advancing one-shot talking head generation capabilities.

Findings

01

Ensures lip synchronization with speech.

02

Enables decoupled control of facial features.

03

Produces realistic, expressive facial animations.

Abstract

Most earlier researches on talking face generation have focused on the synchronization of lip motion and speech content. However, head pose and facial emotions are equally important characteristics of natural faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a novel one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from the general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce Inter-Reconstructed Feature Disentanglement (IRFD) module to decouple facial features into three latent spaces. Then we design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Social Robot Interaction and HRI · Phonetics and Phonology Research