MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training
Yuechen Xie, Haobo Jiang, Jian Yang, Yigong Zhang, Jin Xie

TL;DR
MaskHOI introduces a masked autoencoder pretraining framework that improves 3D hand-object interaction estimation from monocular RGB images by enhancing geometric awareness and occlusion robustness through region-specific masking and SDF-driven learning.
Contribution
The paper proposes a novel Masked Autoencoder-based pretraining method with region-specific masking and SDF-driven learning to better handle occlusions and geometric complexity in 3D HOI estimation.
Findings
Outperforms existing state-of-the-art methods
Enhances geometric awareness of the encoder
Improves robustness to occlusions
Abstract
In 3D hand-object interaction (HOI) tasks, estimating precise joint poses of hands and objects from monocular RGB input remains highly challenging due to the inherent geometric ambiguity of RGB images and the severe mutual occlusions that occur during interaction.To address these challenges, we propose MaskHOI, a novel Masked Autoencoder (MAE)-driven pretraining framework for enhanced HOI pose estimation. Our core idea is to leverage the masking-then-reconstruction strategy of MAE to encourage the feature encoder to infer missing spatial and structural information, thereby facilitating geometric-aware and occlusion-robust representation learning. Specifically, based on our observation that human hands exhibit far greater geometric complexity than rigid objects, conventional uniform masking fails to effectively guide the reconstruction of fine-grained hand structures. To overcome this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Hand Gesture Recognition Systems
