Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network
Xinyi Zhang, Qiqi Bao, Qinpeng Cui, Wenming Yang, Qingmin Liao

TL;DR
Pose Magic introduces a hybrid Mamba-GCN network that achieves state-of-the-art accuracy in 3D human pose estimation while significantly reducing computational costs and maintaining temporal consistency.
Contribution
This work presents a novel hybrid spatiotemporal architecture combining Mamba and GCN for efficient, accurate, and temporally consistent 3D human pose estimation.
Findings
Achieves new SOTA with 0.9 mm error reduction.
Reduces FLOPs by 74.1%.
Maintains motion consistency and generalizes to unseen sequences.
Abstract
Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modeling. Nonetheless, Mamba still faces challenges in precisely exploiting local dependencies between joints. To address these issues, we propose a new attention-free hybrid spatiotemporal architecture named Hybrid Mamba-GCN (Pose Magic). This architecture introduces local enhancement with GCN by capturing relationships between neighboring joints, thus producing new representations to complement Mamba's outputs. By adaptively fusing representations from Mamba and GCN, Pose Magic demonstrates superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
MethodsGraph Convolutional Network · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
