To Whom are You Talking? A Deep Learning Model to Endow Social Robots   with Addressee Estimation Skills

Carlo Mazzola; Marta Romeo; Francesco Rea; Alessandra Sciutti; Angelo; Cangelosi

arXiv:2308.10757·cs.LG·March 29, 2024

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Carlo Mazzola, Marta Romeo, Francesco Rea, Alessandra Sciutti, Angelo, Cangelosi

PDF

1 Repo

TL;DR

This paper presents a deep learning model that enables social robots to estimate the intended addressee of human utterances by analyzing non-verbal cues, facilitating more natural human-robot interactions.

Contribution

It introduces a hybrid deep learning approach combining CNNs and LSTMs for addressee estimation using visual and bodily cues, optimized for deployment on social robots.

Findings

01

Model accurately localizes addressees in space from robot perspective

02

Effective use of face images and body posture vectors for addressee detection

03

Potential for improved social robot interaction capabilities

Abstract

Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.iit.it/cognitiveInteraction/addressee_estimation_ijcnn23.git
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory