Mamba-Driven Topology Fusion for Monocular 3D Human Pose Estimation
Zenghao Zheng, Lianping Yang, Jinshan Pan, Hegui Zhu

TL;DR
This paper introduces a novel topology fusion framework that enhances Mamba-based transformer models for monocular 3D human pose estimation by incorporating skeletal structure and local joint dependencies, resulting in improved accuracy and efficiency.
Contribution
It proposes the Bone Aware Module, graph convolutional enhancements, and a spatiotemporal refinement module to address Mamba's limitations in modeling human skeletal topology.
Findings
Reduces computational cost significantly.
Achieves higher accuracy on Human3.6M and MPI-INF-3DHP datasets.
Demonstrates effectiveness of each proposed module through ablation studies.
Abstract
Transformer-based methods for 3D human pose estimation face significant computational challenges due to the quadratic growth of self-attention mechanism complexity with sequence length. Recently, the Mamba model has substantially reduced computational overhead and demonstrated outstanding performance in modeling long sequences by leveraging state space model (SSM). However, the ability of SSM to process sequential data is not suitable for 3D joint sequences with topological structures, and the causal convolution structure in Mamba also lacks insight into local joint relationships. To address these issues, we propose the Mamba-Driven Topology Fusion framework in this paper. Specifically, the proposed Bone Aware Module infers the direction and length of bone vectors in the spherical coordinate system, providing effective topological guidance for the Mamba model in processing joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
