TL;DR
SHOW3D introduces a novel multi-camera system and dataset for accurate 3D hand-object interaction understanding in diverse real-world environments, overcoming limitations of controlled setting datasets.
Contribution
The paper presents a marker-less, multi-camera capture system and a large-scale dataset with precise 3D annotations in unconstrained environments, enabling better generalization.
Findings
The capture system achieves high-quality 3D annotations in wild settings.
SHOW3D dataset includes diverse outdoor and indoor scenes with annotated hand-object interactions.
Experiments demonstrate improved model performance on downstream tasks using SHOW3D data.
Abstract
Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such data to generalize to real-world scenarios. To address this challenge, we introduce a novel marker-less multi-camera system that allows for nearly unconstrained mobility in genuinely in-the-wild conditions, while still having the ability to generate precise 3D annotations of hands and objects. The capture system consists of a lightweight, back-mounted, multi-camera rig that is synchronized and calibrated with a user-worn VR headset. For 3D ground-truth annotation of hands and objects, we develop an ego-exo tracking pipeline and rigorously evaluate its quality. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
