Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

Yuan Gan; Zongxin Yang; Xihang Yue; Lingyun Sun; Yi Yang

arXiv:2309.04946·cs.SD·October 13, 2023·1 cites

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cost-effective method to enable emotion control in audio-driven talking-head generation by adapting existing models with lightweight, parameter-efficient modules, achieving state-of-the-art results.

Contribution

The proposed EAT method transforms emotion-agnostic models into emotion-controllable ones using lightweight adaptations, reducing training costs and improving flexibility.

Findings

01

Achieves state-of-the-art performance on LRW and MEAD benchmarks.

02

Demonstrates strong generalization with scarce or no emotional training data.

03

Introduces three lightweight modules for emotion control.

Abstract

Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations. Our approach utilizes a pretrained emotion-agnostic talking-head transformer and introduces three lightweight adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and Emotional Adaptation Module) from different perspectives to enable precise and realistic emotion controls. Our experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuangan/eat_code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis