HandMeThat: Human-Robot Communication in Physical and Social   Environments

Yanming Wan; Jiayuan Mao; Joshua B. Tenenbaum

arXiv:2310.03779·cs.AI·October 9, 2023·2 cites

HandMeThat: Human-Robot Communication in Physical and Social Environments

Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum

PDF

Open Access 1 Video

TL;DR

HandMeThat is a comprehensive benchmark for evaluating human-robot communication in physical and social contexts, emphasizing understanding ambiguous instructions through physical and social cues, with initial models showing limited performance.

Contribution

The paper introduces HandMeThat, a new benchmark dataset for holistic evaluation of instruction understanding in physical and social environments, including a textual interface and baseline evaluations.

Findings

01

Baseline models perform poorly, indicating room for improvement.

02

The benchmark covers physical and social cues in human-robot interactions.

03

HandMeThat contains 10,000 episodes of human-robot interaction data.

Abstract

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HandMeThat: Human-Robot Communication in Physical and Social Environments· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling