Exchanging... Watch out!
Liu Yang, Jieyeon Woo, Catherine Achard, Catherine Pelachaud

TL;DR
This paper analyzes multimodal features like prosody and facial expressions to understand conversational exchanges, aiming to improve the interaction capabilities of embodied conversational agents.
Contribution
It introduces a detailed annotation and analysis of multimodal cues in human conversations to enhance agent responsiveness and naturalness.
Findings
Prosodic features like pitch and loudness vary across exchange types.
Facial expressions provide additional cues for turn-taking signals.
Multimodal analysis improves understanding of conversational dynamics.
Abstract
During a conversation, individuals take turns speaking and engage in exchanges, which can occur smoothly or involve interruptions. Listeners have various ways of participating, such as displaying backchannels, signalling the aim to take a turn, waiting for the speaker to yield the floor, or even interrupting and taking over the conversation. These exchanges are commonplace in natural interactions. To create realistic and engaging interactions between human participants and embodied conversational agents (ECAs), it is crucial to equip virtual agents with the ability to manage these exchanges. This includes being able to initiate or respond to signals from the human user. In order to achieve this, we annotate, analyze and characterize these exchanges in human-human conversations. In this paper, we present an analysis of multimodal features, with a focus on prosodic features such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Speech and dialogue systems · Language and cultural evolution
