Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders

Hui Yang; Wei Sun; Jian Liu; Jin Zheng; Jian Xiao; Ajmal Mian

arXiv:2506.10816·cs.CV·June 13, 2025

Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders

Hui Yang, Wei Sun, Jian Liu, Jin Zheng, Jian Xiao, Ajmal Mian

PDF

Open Access

TL;DR

This paper introduces HOMAE, a novel occlusion-aware method for 3D hand-object pose estimation using masked autoencoders, which effectively handles occlusions by learning context-aware features and combining implicit and explicit geometric representations.

Contribution

The paper proposes a target-focused masking strategy and a fusion of signed distance fields with point clouds to improve occlusion handling in hand-object pose estimation.

Findings

01

Achieves state-of-the-art results on DexYCB and HO3Dv2 benchmarks.

02

Effectively models occluded regions by combining global context and local geometry.

03

Demonstrates robustness in challenging occlusion scenarios.

Abstract

Hand-object pose estimation from monocular RGB images remains a significant challenge mainly due to the severe occlusions inherent in hand-object interactions. Existing methods do not sufficiently explore global structural perception and reasoning, which limits their effectiveness in handling occluded hand-object interactions. To address this challenge, we propose an occlusion-aware hand-object pose estimation method based on masked autoencoders, termed as HOMAE. Specifically, we propose a target-focused masking strategy that imposes structured occlusion on regions of hand-object interaction, encouraging the model to learn context-aware features and reason about the occluded structures. We further integrate multi-scale features extracted from the decoder to predict a signed distance field (SDF), capturing both global context and fine-grained geometry. To enhance geometric perception, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Hand Gesture Recognition Systems