Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration
Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao,, Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba

TL;DR
This paper introduces the Watch-And-Help challenge and VirtualHome-Social environment to evaluate social perception and human-AI collaboration in household tasks, enabling systematic assessment of machine social intelligence.
Contribution
It presents a new challenge and benchmark for testing social intelligence in AI, focusing on understanding goals and collaborating with humans in household tasks.
Findings
Benchmark includes planning and learning baselines.
AI agents outperform random baselines in task completion.
Human-AI collaboration improves with advanced social perception.
Abstract
In this paper, we introduce Watch-And-Help (WAH), a challenge for testing social intelligence in agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. To succeed, the AI agent needs to i) understand the underlying goal of the task by watching a single demonstration of the human-like agent performing the same task (social perception), and ii) coordinate with the human-like agent to solve the task in an unseen environment as fast as possible (human-AI collaboration). For this challenge, we build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines. We evaluate the performance of AI agents with the human-like agent as well as with real humans using objective metrics and subjective user ratings. Experimental results demonstrate that the proposed challenge and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Artificial Intelligence in Games
