EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware   Motion Model

Xinya Ji; Hang Zhou; Kaisiyuan Wang; Qianyi Wu; Wayne Wu; Feng Xu; Xun; Cao

arXiv:2205.15278·cs.CV·September 26, 2022

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun, Cao

PDF

Open Access

TL;DR

This paper introduces EAMM, a novel model that generates one-shot emotional talking faces from audio and emotion videos, effectively capturing facial emotions and applicable to arbitrary subjects.

Contribution

The paper presents a new emotion-aware motion model with an implicit emotion displacement learner for realistic emotional talking face synthesis.

Findings

01

Successfully generates emotional talking faces with realistic expressions

02

Applicable to arbitrary subjects without subject-specific training

03

Outperforms existing methods in emotion and motion realism

Abstract

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model (EAMM) to generate one-shot emotional talking faces by involving an emotion source video. Specifically, we first propose an Audio2Facial-Dynamics module, which renders talking faces from audio-driven unsupervised zero- and first-order key-points motion. Then through exploring the motion model's properties, we further propose an Implicit Emotion Displacement Learner to represent emotion-related facial dynamics as linearly additive displacements to the previously acquired motion representations. Comprehensive experiments demonstrate that by incorporating the results from both modules, our method can generate satisfactory talking face results on arbitrary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing