ForeHOI: Feed-forward 3D Object Reconstruction from Daily Hand-Object Interaction Videos

Yuantao Chen; Jiahao Chang; Chongjie Ye; Chaoran Zhang; Zhaojie Fang; Chenghong Li; Xiaoguang Han

arXiv:2602.06226·cs.CV·February 9, 2026

ForeHOI: Feed-forward 3D Object Reconstruction from Daily Hand-Object Interaction Videos

Yuantao Chen, Jiahao Chang, Chongjie Ye, Chaoran Zhang, Zhaojie Fang, Chenghong Li, Xiaoguang Han

PDF

Open Access 1 Datasets

TL;DR

ForeHOI is a fast, feed-forward model that reconstructs 3D object geometry from monocular hand-object videos, effectively handling occlusions and outperforming optimization-based methods in speed and accuracy.

Contribution

The paper introduces ForeHOI, the first large-scale synthetic dataset for hand-object interactions, and a novel feed-forward approach for 3D object reconstruction from monocular videos.

Findings

01

Achieves state-of-the-art reconstruction accuracy.

02

Runs approximately 100 times faster than previous optimization-based methods.

03

Effectively handles severe occlusions in monocular videos.

Abstract

The ubiquity of monocular videos capturing daily hand-object interactions presents a valuable resource for embodied intelligence. While 3D hand reconstruction from in-the-wild videos has seen significant progress, reconstructing the involved objects remains challenging due to severe occlusions and the complex, coupled motion of the camera, hands, and object. In this paper, we introduce ForeHOI, a novel feed-forward model that directly reconstructs 3D object geometry from monocular hand-object interaction videos within one minute of inference time, eliminating the need for any pre-processing steps. Our key insight is that, the joint prediction of 2D mask inpainting and 3D shape completion in a feed-forward framework can effectively address the problem of severe occlusion in monocular hand-held object videos, thereby achieving results that outperform the performance of optimization-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

YuantaoChen/ForeHOI
dataset· 23k dl
23k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · 3D Shape Modeling and Analysis