DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

Yifan Zhou; Takehiko Ohkawa; Guwenxiao Zhou; Kanoko Goto; Takumi Hirose; Yusuke Sekikawa; Nakamasa Inoue

arXiv:2512.02727·cs.CV·December 3, 2025

DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

Yifan Zhou, Takehiko Ohkawa, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa Inoue

PDF

Open Access

TL;DR

This paper introduces DF-Mamba, a deformable state space model that enhances 3D hand pose estimation by capturing global context and handling occlusions more effectively than traditional CNN-based methods.

Contribution

The paper proposes DF-Mamba, a novel deformable state space framework that improves feature extraction for 3D hand pose estimation, outperforming existing backbones across multiple datasets.

Findings

01

DF-Mamba achieves state-of-the-art accuracy on five diverse datasets.

02

It significantly outperforms existing backbones like VMamba and Spatial-Mamba.

03

The method maintains comparable inference speed to ResNet-50.

Abstract

Modeling daily hand interactions often struggles with severe occlusions, such as when two hands overlap, which highlights the need for robust feature learning in 3D hand pose estimation (HPE). To handle such occluded hand images, it is vital to effectively learn the relationship between local image features (e.g., for occluded joints) and global context (e.g., cues from inter-joints, inter-hands, or the scene). However, most current 3D HPE methods still rely on ResNet for feature extraction, and such CNN's inductive bias may not be optimal for 3D HPE due to its limited capability to model the global context. To address this limitation, we propose an effective and efficient framework for visual feature extraction in 3D HPE using recent state space modeling (i.e., Mamba), dubbed Deformable Mamba (DF-Mamba). DF-Mamba is designed to capture global context cues beyond standard convolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Robot Manipulation and Learning