HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images
Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-S\'ebastien Franco, Gr\'egory Rogez

TL;DR
HOSt3R introduces a keypoint-free method for 3D hand-object reconstruction from RGB videos, improving robustness and generalization without relying on keypoint detection or object templates.
Contribution
The paper presents HOSt3R, a novel approach that eliminates keypoint detection for hand-object 3D reconstruction, enabling more scalable and generalizable applications.
Findings
Achieves state-of-the-art results on SHOWMe benchmark.
Demonstrates effective generalization to unseen object categories.
Operates without requiring camera intrinsics or pre-scanned templates.
Abstract
Hand-object 3D reconstruction has become increasingly important for applications in human-robot interaction and immersive AR/VR experiences. A common approach for object-agnostic hand-object reconstruction from RGB sequences involves a two-stage pipeline: hand-object 3D tracking followed by multi-view 3D reconstruction. However, existing methods rely on keypoint detection techniques, such as Structure from Motion (SfM) and hand-keypoint optimization, which struggle with diverse object geometries, weak textures, and mutual hand-object occlusions, limiting scalability and generalization. As a key enabler to generic and seamless, non-intrusive applicability, we propose in this work a robust, keypoint detector-free approach to estimating hand-object 3D transformations from monocular motion video/images. We further integrate this with a multi-view reconstruction pipeline to accurately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
