Emotional Listener Portrait: Neural Listener Head Generation with   Emotion

Luchuan Song; Guojun Yin; Zhenchao Jin; Xiaoyi Dong; Chenliang Xu

arXiv:2310.00068·cs.GR·October 10, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion

Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, Chenliang Xu

PDF

Open Access

TL;DR

This paper introduces the Emotional Listener Portrait (ELP), a model that generates diverse and controllable listener facial responses in conversations by modeling discrete facial motions conditioned on emotions.

Contribution

The paper presents a novel discrete motion-codeword based model that explicitly captures emotion-dependent facial behaviors for listener head generation.

Findings

01

ELP outperforms previous methods on quantitative metrics.

02

ELP can generate both natural and controllable listener responses.

03

The model effectively captures emotion-dependent facial motion distributions.

Abstract

Listener head generation centers on generating non-verbal behaviors (e.g., smile) of a listener in reference to the information delivered by a speaker. A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle this problem, we propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords and explicitly models the probability distribution of the motions under different emotion in conversation. Benefiting from the ``explicit'' and ``discrete'' design, our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis