STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video
Yang Liu, Zhiyong Zhang

TL;DR
STGFormer introduces a novel spatio-temporal graph attention framework that effectively models human body structure and dynamics, significantly improving 3D human pose estimation accuracy in videos.
Contribution
The paper proposes the Spatio-Temporal criss-cross Graph attention mechanism and a dual-path MHR-GCN to better capture spatiotemporal dependencies and higher-order information in video-based 3D pose estimation.
Findings
Achieves state-of-the-art results on Human3.6M dataset.
Outperforms previous methods on MPIINF-3DHP dataset.
Effectively models long-range spatiotemporal dependencies.
Abstract
The current methods of video-based 3D human pose estimation have achieved significant progress.However, they still face pressing challenges, such as the underutilization of spatiotemporal bodystructure features in transformers and the inadequate granularity of spatiotemporal interaction modeling in graph convolutional networks, which leads to pervasive depth ambiguity in monocular 3D human pose estimation. To address these limitations, this paper presents the Spatio-Temporal GraphFormer framework (STGFormer) for 3D human pose estimation in videos. First, we introduce a Spatio-Temporal criss-cross Graph (STG) attention mechanism, designed to more effectively leverage the inherent graph priors of the human body within continuous sequence distributions while capturing spatiotemporal long-range dependencies. Next, we present a dual-path Modulated Hop-wise Regular GCN (MHR-GCN) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsSoftmax · Attention Is All You Need · Graph Convolutional Network
