Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction
He Zhu, Ryo Miyoshi, Yuki Okafuji

TL;DR
This paper introduces a Transformer-based framework for social robots to effectively determine when and whom to respond to in multi-party interactions, enhancing naturalness and context-awareness.
Contribution
It presents a novel multi-task learning model with new loss functions and a real-world dataset for multi-party human-robot interaction.
Findings
Achieves state-of-the-art response decision accuracy
Outperforms heuristic and single-task baselines
Handles real-world complexities like gaze misalignment
Abstract
Prior human-robot interaction (HRI) research has primarily focused on single-user interactions, where robots do not need to consider the timing or recipient of their responses. However, in multi-party interactions, such as at malls and hospitals, social robots must understand the context and decide both when and to whom they should respond. In this paper, we propose a Transformer-based multi-task learning framework to improve the decision-making process of social robots, particularly in multi-user environments. Considering the characteristics of HRI, we propose two novel loss functions: one that enforces constraints on active speakers to improve scene modeling, and another that guides response selection towards utterances specifically directed at the robot. Additionally, we construct a novel multi-party HRI dataset that captures real-world complexities, such as gaze misalignment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI
