MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Shibo Wang; Haonan He; Maria Parelli; Christoph Gebhardt; Zicong Fan; Jie Song

arXiv:2508.05506·cs.CV·August 8, 2025

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Shibo Wang, Haonan He, Maria Parelli, Christoph Gebhardt, Zicong Fan, Jie Song

PDF

TL;DR

MagicHOI introduces a novel approach that uses large-scale view synthesis diffusion models as priors to improve 3D hand-object reconstruction from short monocular videos, especially when object visibility is limited.

Contribution

It integrates view synthesis diffusion priors into hand-object reconstruction, enabling accurate results without full object visibility or extensive paired data.

Findings

01

Outperforms existing state-of-the-art methods

02

Effectively regularizes unseen object regions

03

Enhances 3D reconstruction accuracy

Abstract

Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, large-scale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.