H2O: A Benchmark for Visual Human-human Object Handover Analysis

Ruolin Ye; Wenqiang Xu; Zhendong Xue; Tutian Tang; Yanfeng Wang; Cewu; Lu

arXiv:2104.11466·cs.CV·October 28, 2021

H2O: A Benchmark for Visual Human-human Object Handover Analysis

Ruolin Ye, Wenqiang Xu, Zhendong Xue, Tutian Tang, Yanfeng Wang, Cewu, Lu

PDF

Open Access

TL;DR

This paper introduces H2O, a comprehensive dataset for analyzing human-human object handovers through visual data, supporting multiple vision tasks and including a baseline for grasp prediction.

Contribution

It provides a novel, richly annotated dataset for human handover analysis and introduces RGPNet, a baseline model for receiver grasp prediction in this context.

Findings

01

RGPNet can generate plausible grasps from pre-handover states

02

The dataset enables evaluation of hand and object pose estimation

03

H2O supports robot imitation learning for handover tasks

Abstract

Object handover is a common human collaboration behavior that attracts attention from researchers in Robotics and Cognitive Science. Though visual perception plays an important role in the object handover task, the whole handover process has been specifically explored. In this work, we propose a novel rich-annotated dataset, H2O, for visual analysis of human-human object handovers. The H2O, which contains 18K video clips involving 15 people who hand over 30 objects to each other, is a multi-purpose benchmark. It can support several vision-based tasks, from which, we specifically provide a baseline method, RGPNet, for a less-explored task named Receiver Grasp Prediction. Extensive experiments show that the RGPNet can produce plausible grasps based on the giver's hand-object states in the pre-handover phase. Besides, we also report the hand and object pose errors with existing baselines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications