H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys

TL;DR
This paper introduces H2O, a comprehensive egocentric dataset with 3D annotations of two hands and objects, and proposes a method to recognize first-person interactions by estimating hand and object poses from RGB images.
Contribution
The paper presents the first detailed egocentric 3D interaction dataset with synchronized multi-view RGB-D data and a novel pose-based interaction recognition method using graph convolutional networks.
Findings
Achieved state-of-the-art accuracy in first-person interaction recognition.
Established a strong baseline for joint hand-object pose estimation.
Provided a new benchmark dataset for egocentric 3D interaction analysis.
Abstract
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame. Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. To the best of our knowledge, this is the first benchmark that enables the study of first-person actions with the use of the pose of both left and right hands manipulating objects and presents an unprecedented level of detail for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
