HandMCM: Multi-modal Point Cloud-based Correspondence State Space Model for 3D Hand Pose Estimation

Wencan Cheng; Gim Hee Lee

arXiv:2602.01586·cs.CV·April 7, 2026

HandMCM: Multi-modal Point Cloud-based Correspondence State Space Model for 3D Hand Pose Estimation

Wencan Cheng, Gim Hee Lee

PDF

TL;DR

HandMCM is a novel multi-modal point cloud-based model that improves 3D hand pose estimation accuracy, especially under occlusion, by integrating correspondence modeling and local information filtering.

Contribution

The paper introduces HandMCM, a new method combining a state space model with multi-modal features to better handle occlusions in 3D hand pose estimation.

Findings

01

Outperforms existing methods on three benchmark datasets.

02

Achieves higher accuracy in severe occlusion scenarios.

03

Demonstrates robustness across various occlusion conditions.

Abstract

3D hand pose estimation that involves accurate estimation of 3D human hand keypoint locations is crucial for many human-computer interaction applications such as augmented reality. However, this task poses significant challenges due to self-occlusion of the hands and occlusions caused by interactions with objects. In this paper, we propose HandMCM to address these challenges. Our HandMCM is a novel method based on the powerful state space model (Mamba). By incorporating modules for local information injection/filtering and correspondence modeling, the proposed correspondence Mamba effectively learns the highly dynamic kinematic topology of keypoints across various occlusion scenarios. Moreover, by integrating multi-modal image features, we enhance the robustness and representational capacity of the input, leading to more accurate hand pose estimation. Empirical evaluations on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.