Mamba-Driven Topology Fusion for Monocular 3D Human Pose Estimation

Zenghao Zheng; Lianping Yang; Jinshan Pan; Hegui Zhu

arXiv:2505.20611·cs.CV·September 29, 2025

Mamba-Driven Topology Fusion for Monocular 3D Human Pose Estimation

Zenghao Zheng, Lianping Yang, Jinshan Pan, Hegui Zhu

PDF

Open Access

TL;DR

This paper introduces a novel topology fusion framework that enhances Mamba-based transformer models for monocular 3D human pose estimation by incorporating skeletal structure and local joint dependencies, resulting in improved accuracy and efficiency.

Contribution

It proposes the Bone Aware Module, graph convolutional enhancements, and a spatiotemporal refinement module to address Mamba's limitations in modeling human skeletal topology.

Findings

01

Reduces computational cost significantly.

02

Achieves higher accuracy on Human3.6M and MPI-INF-3DHP datasets.

03

Demonstrates effectiveness of each proposed module through ablation studies.

Abstract

Transformer-based methods for 3D human pose estimation face significant computational challenges due to the quadratic growth of self-attention mechanism complexity with sequence length. Recently, the Mamba model has substantially reduced computational overhead and demonstrated outstanding performance in modeling long sequences by leveraging state space model (SSM). However, the ability of SSM to process sequential data is not suitable for 3D joint sequences with topological structures, and the causal convolution structure in Mamba also lacks insight into local joint relationships. To address these issues, we propose the Mamba-Driven Topology Fusion framework in this paper. Specifically, the proposed Bone Aware Module infers the direction and length of bone vectors in the spherical coordinate system, providing effective topological guidance for the Mamba model in processing joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging