Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction

He Zhu; Ryo Miyoshi; Yuki Okafuji

arXiv:2507.10960·cs.RO·July 16, 2025

Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction

He Zhu, Ryo Miyoshi, Yuki Okafuji

PDF

Open Access

TL;DR

This paper introduces a Transformer-based framework for social robots to effectively determine when and whom to respond to in multi-party interactions, enhancing naturalness and context-awareness.

Contribution

It presents a novel multi-task learning model with new loss functions and a real-world dataset for multi-party human-robot interaction.

Findings

01

Achieves state-of-the-art response decision accuracy

02

Outperforms heuristic and single-task baselines

03

Handles real-world complexities like gaze misalignment

Abstract

Prior human-robot interaction (HRI) research has primarily focused on single-user interactions, where robots do not need to consider the timing or recipient of their responses. However, in multi-party interactions, such as at malls and hospitals, social robots must understand the context and decide both when and to whom they should respond. In this paper, we propose a Transformer-based multi-task learning framework to improve the decision-making process of social robots, particularly in multi-user environments. Considering the characteristics of HRI, we propose two novel loss functions: one that enforces constraints on active speakers to improve scene modeling, and another that guides response selection towards utterances specifically directed at the robot. Additionally, we construct a novel multi-party HRI dataset that captures real-world complexities, such as gaze misalignment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI