Starting engagement detection towards a companion robot using multimodal features
Dominique Vaufreydaz (INRIA Grenoble Rh\^one-Alpes / LIG Laboratoire, d'Informatique de Grenoble), Wafa Johal (LIG), Claudine Combe (INRIA Grenoble, Rh\^one-Alpes / LIG Laboratoire d'Informatique de Grenoble)

TL;DR
This paper presents a multimodal feature-based method inspired by social sciences for detecting the intention to start interaction with a robot, demonstrating improved accuracy over traditional spatial features in real-world conditions.
Contribution
It introduces a novel multimodal feature set for engagement detection, validated on spontaneous interaction data, and highlights the importance of feature selection and space reduction challenges.
Findings
Multimodal features outperform spatial features in detection accuracy.
Seven features are sufficient for effective engagement detection.
Space reduction of features remains a complex challenge.
Abstract
Recognition of intentions is a subconscious cognitive process vital to human communication. This skill enables anticipation and increases the quality of interactions between humans. Within the context of engagement, non-verbal signals are used to communicate the intention of starting the interaction with a partner. In this paper, we investigated methods to detect these signals in order to allow a robot to know when it is about to be addressed. Originality of our approach resides in taking inspiration from social and cognitive sciences to perform our perception task. We investigate meaningful features, i.e. human readable features, and elicit which of these are important for recognizing someone's intention of starting an interaction. Classically, spatial information like the human position and speed, the human-robot distance are used to detect the engagement. Our approach integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
