Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Mohan Zhou; Yalong Bai; Wei Zhang; Ting Yao; Tiejun Zhao; Tao Mei

arXiv:2112.13548·cs.CV·July 21, 2022

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

PDF

Open Access

TL;DR

This paper introduces ViCo, a new benchmark dataset for listening head generation during face-to-face conversations, and provides a baseline model to foster research in responsive non-verbal feedback synthesis.

Contribution

The paper presents the first comprehensive dataset and baseline for listening head generation, enabling real-time synthesis of listener responses conditioned on speaker signals.

Findings

01

ViCo dataset includes 92 identities and 483 clips with diverse listening styles.

02

Baseline model demonstrates effective real-time listening head synthesis.

03

Dataset and code are publicly available for research use.

Abstract

We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Social Robot Interaction and HRI · Multimodal Machine Learning Applications