ForeHOI: Feed-forward 3D Object Reconstruction from Daily Hand-Object Interaction Videos
Yuantao Chen, Jiahao Chang, Chongjie Ye, Chaoran Zhang, Zhaojie Fang, Chenghong Li, Xiaoguang Han

TL;DR
ForeHOI is a fast, feed-forward model that reconstructs 3D object geometry from monocular hand-object videos, effectively handling occlusions and outperforming optimization-based methods in speed and accuracy.
Contribution
The paper introduces ForeHOI, the first large-scale synthetic dataset for hand-object interactions, and a novel feed-forward approach for 3D object reconstruction from monocular videos.
Findings
Achieves state-of-the-art reconstruction accuracy.
Runs approximately 100 times faster than previous optimization-based methods.
Effectively handles severe occlusions in monocular videos.
Abstract
The ubiquity of monocular videos capturing daily hand-object interactions presents a valuable resource for embodied intelligence. While 3D hand reconstruction from in-the-wild videos has seen significant progress, reconstructing the involved objects remains challenging due to severe occlusions and the complex, coupled motion of the camera, hands, and object. In this paper, we introduce ForeHOI, a novel feed-forward model that directly reconstructs 3D object geometry from monocular hand-object interaction videos within one minute of inference time, eliminating the need for any pre-processing steps. Our key insight is that, the joint prediction of 2D mask inpainting and 3D shape completion in a feed-forward framework can effectively address the problem of severe occlusion in monocular hand-held object videos, thereby achieving results that outperform the performance of optimization-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · 3D Shape Modeling and Analysis
