Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

TL;DR
This paper introduces ViCo, a new benchmark dataset for listening head generation during face-to-face conversations, and provides a baseline model to foster research in responsive non-verbal feedback synthesis.
Contribution
The paper presents the first comprehensive dataset and baseline for listening head generation, enabling real-time synthesis of listener responses conditioned on speaker signals.
Findings
ViCo dataset includes 92 identities and 483 clips with diverse listening styles.
Baseline model demonstrates effective real-time listening head synthesis.
Dataset and code are publicly available for research use.
Abstract
We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Social Robot Interaction and HRI · Multimodal Machine Learning Applications
