Emotional Conversation: Empowering Talking Faces with Cohesive   Expression, Gaze and Pose Generation

Jiadong Liang; Feng Lu

arXiv:2406.07895·cs.CV·June 13, 2024·1 cites

Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

Jiadong Liang, Feng Lu

PDF

Open Access

TL;DR

This paper introduces a two-stage framework for generating emotionally expressive talking face videos by aligning facial cues like expression, gaze, and pose with speech, using 3D landmarks and self-supervised learning.

Contribution

It presents a novel two-stage approach that synthesizes emotionally aligned facial landmarks and generates high-quality talking face videos, improving realism and emotional coherence.

Findings

01

Outperforms state-of-the-art in visual quality

02

Achieves better emotional alignment in generated videos

03

Demonstrates effectiveness on the MEAD dataset

Abstract

Vivid talking face generation holds immense potential applications across diverse multimedia domains, such as film and game production. While existing methods accurately synchronize lip movements with input audio, they typically ignore crucial alignments between emotion and facial cues, which include expression, gaze, and head pose. These alignments are indispensable for synthesizing realistic videos. To address these issues, we propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables. This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning. Specifically, we decompose this task into two key steps, namely speech-to-landmarks synthesis and landmarks-to-face generation. The first step focuses on simultaneously synthesizing emotionally aligned facial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Social Robot Interaction and HRI · Speech and dialogue systems