Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance
Ayce Idil Aytekin, Helge Rhodin, Rishabh Dabral, Christian Theobalt

TL;DR
This paper introduces a diffusion-based method for reconstructing 3D hand-held object geometry from monocular images, leveraging hand-object interaction cues for high-quality, physically plausible results.
Contribution
It presents a novel diffusion framework that integrates geometric guidance and optimization-in-the-loop to improve 3D reconstruction quality and realism.
Findings
Produces high-quality 3D reconstructions from monocular images.
Ensures physically plausible hand-object interactions.
Performs well under occlusion and in real-world scenarios.
Abstract
We propose a novel diffusion-based framework for reconstructing 3D geometry of hand-held objects from monocular RGB images by leveraging hand-object interaction as geometric guidance. Our method conditions a latent diffusion model on an inpainted object appearance and uses inference-time guidance to optimize the object reconstruction, while simultaneously ensuring plausible hand-object interactions. Unlike prior methods that rely on extensive post-processing or produce low-quality reconstructions, our approach directly generates high-quality object geometry during the diffusion process by introducing guidance with an optimization-in-the-loop design. Specifically, we guide the diffusion model by applying supervision to the velocity field while simultaneously optimizing the transformations of both the hand and the object being reconstructed. This optimization is driven by multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
