HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen,, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi

TL;DR
HOI4D is a comprehensive 4D egocentric dataset with extensive annotations designed to advance research in category-level human-object interactions, offering new benchmarks and challenging existing methods.
Contribution
The paper introduces HOI4D, a large-scale, richly annotated 4D dataset for egocentric human-object interaction research, along with three new benchmarking tasks.
Findings
Existing methods struggle with HOI4D's complexity.
HOI4D enables new research directions in 4D interaction understanding.
The dataset facilitates evaluation of semantic segmentation, pose tracking, and action segmentation.
Abstract
We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 4 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
