Towards unconstrained joint hand-object reconstruction from RGB videos
Yana Hasson, G\"ul Varol, Ivan Laptev, Cordelia Schmid

TL;DR
This paper introduces a learning-free method for 3D reconstruction of hands and objects from monocular videos, enabling applications in robotics and human demonstration analysis without requiring 3D supervision.
Contribution
It presents a novel fitting approach that handles two-hand object interactions without supervised training, unlike previous methods that depend on 3D ground truth data.
Findings
Effective on datasets with varying difficulty levels
Does not require 3D supervision or training data
Handles two-hand object interactions seamlessly
Abstract
Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos. Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. The supervised learning approach to this problem, however, requires 3D supervision and remains limited to constrained laboratory settings and simulators for which 3D ground truth is available. In this paper we first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions. Our method relies on cues obtained with common methods for object detection, hand pose estimation and instance segmentation. We quantitatively evaluate our approach and show that it can be applied to datasets with varying levels of difficulty for which training data is unavailable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Hand Gesture Recognition Systems
