Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions
Ayce Idil Aytekin, Xu Chen, Zhengyang Shen, Thabo Beeler, Helge Rhodin, Rishabh Dabral, Christian Theobalt

TL;DR
GraG is a fast, robust method for reconstructing 3D hand-object interactions from monocular video using a compact Gaussian-based representation, outperforming prior methods in speed and accuracy.
Contribution
The paper introduces GraG, a novel approach combining classical Gaussian tracking with modern initialization for efficient 3D hand-object reconstruction from monocular videos.
Findings
Reconstructs hand-object interactions 6.4x faster than prior work.
Improves object reconstruction accuracy by 13.4%.
Reduces hand joint position error by over 65%.
Abstract
We present Grasp in Gaussians (GraG), a fast and robust method for reconstructing dynamic 3D hand-object interactions from a single monocular video. Unlike recent approaches that optimize heavy neural representations, our method focuses on tracking the hand and the object efficiently, once initialized from pretrained large models. Our key insight is that accurate and temporally stable hand-object motion can be recovered using a compact Sum-of-Gaussians (SoG) representation, revived from classical tracking literature and integrated with generative Gaussian-based initializations. We initialize object pose and geometry using a video-adapted SAM3D pipeline, then convert the resulting dense Gaussian representation into a lightweight SoG via subsampling. This compact representation enables efficient and fast tracking while preserving geometric fidelity. For the hand, we adopt a complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
