Active Listener: Continuous Generation of Listener's Head Motion   Response in Dyadic Interactions

Bishal Ghosh; Emma Li; Tanaya Guha

arXiv:2409.20188·cs.RO·October 1, 2024

Active Listener: Continuous Generation of Listener's Head Motion Response in Dyadic Interactions

Bishal Ghosh, Emma Li, Tanaya Guha

PDF

Open Access 1 Repo

TL;DR

This paper presents a real-time, data-driven model that generates natural listener head movements in response to speaker speech, advancing non-verbal communication in human-robot interactions.

Contribution

It introduces the first end-to-end graph-based model for continuous, real-time listener head motion generation directly from speech audio, without manual annotations.

Findings

01

Achieves low error of 4.5 degrees in head pose prediction

02

Operates at high frame rate suitable for real-world deployment

03

Demonstrates effectiveness on IEMOCAP dyadic interaction data

Abstract

A key component of dyadic spoken interactions is the contextually relevant non-verbal gestures, such as head movements that reflect a listener's response to the interlocutor's speech. Although significant progress has been made in the context of generating co-speech gestures, generating listener's response has remained a challenge. We introduce the task of generating continuous head motion response of a listener in response to the speaker's speech in real time. To this end, we propose a graph-based end-to-end crossmodal model that takes interlocutor's speech audio as input and directly generates head pose angles (roll, pitch, yaw) of the listener in real time. Different from previous work, our approach is completely data-driven, does not require manual annotations or oversimplify head motion to merely nods and shakes. Extensive evaluation on the dyadic interaction sessions on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigzen/Active-Listener
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing