Interactive Conversational Head Generation
Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao

TL;DR
This paper introduces a new benchmark and datasets for synthesizing interactive conversational heads capable of both talking and listening, enabling more realistic face-to-face virtual interactions.
Contribution
It presents novel datasets, ViCo and ViCo-X, and defines three new tasks for interactive conversational head generation, advancing the creation of responsive and expressive virtual agents.
Findings
Baseline methods can generate responsive, vivid conversational agents.
The datasets facilitate training models for multi-turn face-to-face interactions.
Experimental results demonstrate effective collaboration between virtual agents and real persons.
Abstract
We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation. The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications, including digital humans, virtual agents, and social robots. While existing research primarily focuses on talking head generation (one-way interaction), hindering the ability to create a digital human for conversation (two-way) interaction due to the absence of listening and interaction parts. In this work, we construct two datasets to address this issue, ``ViCo'' for independent talking and listening head generation tasks at the sentence level, and ``ViCo-X'', for synthesizing interlocutors in multi-turn conversational scenarios. Based on ViCo and ViCo-X, we define three novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Speech and dialogue systems · Face recognition and analysis
MethodsAttentive Walk-Aggregating Graph Neural Network
