Emotional Listener Portrait: Neural Listener Head Generation with Emotion
Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, Chenliang Xu

TL;DR
This paper introduces the Emotional Listener Portrait (ELP), a model that generates diverse and controllable listener facial responses in conversations by modeling discrete facial motions conditioned on emotions.
Contribution
The paper presents a novel discrete motion-codeword based model that explicitly captures emotion-dependent facial behaviors for listener head generation.
Findings
ELP outperforms previous methods on quantitative metrics.
ELP can generate both natural and controllable listener responses.
The model effectively captures emotion-dependent facial motion distributions.
Abstract
Listener head generation centers on generating non-verbal behaviors (e.g., smile) of a listener in reference to the information delivered by a speaker. A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle this problem, we propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords and explicitly models the probability distribution of the motions under different emotion in conversation. Benefiting from the ``explicit'' and ``discrete'' design, our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
