TL;DR
This paper introduces Hoi!, a comprehensive multimodal dataset capturing force, visual, and tactile data during articulated manipulation, facilitating research in cross-view interaction understanding and transfer learning.
Contribution
The dataset uniquely combines force, video, and tactile data across multiple embodiments and environments, enabling new research in force-grounded, cross-view manipulation tasks.
Findings
Provides 3048 sequences of articulated object interactions.
Includes data from four different embodiments, including human and robotic tools.
Enables evaluation of transferability between human and robotic manipulation.
Abstract
We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated in four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper, where the tool embodiment provides end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as interaction forces. The Project Website can be found at https://timengelbracht.github.io/Hoi-Dataset-Website/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
