EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face   Generation

Guanwen Feng; Haoran Cheng; Yunan Li; Zhiyuan Ma; Chaoneng Li; Zhihao; Qian; Qiguang Miao; Chi-Man Pun

arXiv:2402.01422·cs.CV·February 5, 2024·1 cites

EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

Guanwen Feng, Haoran Cheng, Yunan Li, Zhiyuan Ma, Chaoneng Li, Zhihao, Qian, Qiguang Miao, Chi-Man Pun

PDF

Open Access

TL;DR

EmoSpeaker is a novel method for fine-grained emotion-controlled talking face generation that improves emotional expression and lip synchronization using a visual attribute-guided audio decoupler and emotion intensity control.

Contribution

The paper introduces a visual attribute-guided audio decoupler, a fine-grained emotion coefficient prediction module, and an emotion intensity control method, advancing emotion control in talking face generation.

Findings

01

Outperforms existing methods in expression variation

02

Enhances lip synchronization accuracy

03

Enables finer emotion intensity classification

Abstract

Implementing fine-grained emotion control is crucial for emotion generation tasks because it enhances the expressive capability of the generative model, allowing it to accurately and comprehensively capture and express various nuanced emotional states, thereby improving the emotional quality and personalization of generated content. Generating fine-grained facial animations that accurately portray emotional expressions using only a portrait and an audio recording presents a challenge. In order to address this challenge, we propose a visual attribute-guided audio decoupler. This enables the obtention of content vectors solely related to the audio content, enhancing the stability of subsequent lip movement coefficient predictions. To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module. Additionally, we propose an emotion intensity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis