GenHOI: Generalized Hand-Object Pose Estimation with Occlusion Awareness

Hui Yang; Wei Sun; Jian Liu; Jian Xiao; Tao Xie; Hossein Rahmani; Ajmal Saeed Mian; Nicu Sebe; Gim Hee Lee

arXiv:2603.19013·cs.CV·March 31, 2026

GenHOI: Generalized Hand-Object Pose Estimation with Occlusion Awareness

Hui Yang, Wei Sun, Jian Liu, Jian Xiao, Tao Xie, Hossein Rahmani, Ajmal Saeed Mian, Nicu Sebe, Gim Hee Lee

PDF

TL;DR

GenHOI is a novel framework that improves 3D hand-object pose estimation from RGB images by incorporating hierarchical semantic prompts, multi-modal masked modeling, and hand priors to handle occlusions and unseen interactions.

Contribution

It introduces a hierarchical semantic prompt and multi-modal masked modeling strategy for robust, generalized hand-object pose estimation under occlusion and unseen scenarios.

Findings

01

Achieves state-of-the-art results on DexYCB and HO3Dv2 benchmarks.

02

Effectively handles occlusion and unseen objects in pose estimation.

03

Utilizes multi-modal data and hierarchical prompts for improved generalization.

Abstract

Generalized 3D hand-object pose estimation from a single RGB image remains challenging due to the large variations in object appearances and interaction patterns, especially under heavy occlusion. We propose GenHOI, a framework for generalized hand-object pose estimation with occlusion awareness. GenHOI integrates hierarchical semantic knowledge with hand priors to enhance model generalization under challenging occlusion conditions. Specifically, we introduce a hierarchical semantic prompt that encodes object states, hand configurations, and interaction patterns via textual descriptions. This enables the model to learn abstract high-level representations of hand-object interactions for generalization to unseen objects and novel interactions while compensating for missing or ambiguous visual cues. To enable robust occlusion reasoning, we adopt a multi-modal masked modeling strategy over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.