STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Yang Liu; Zhiyong Zhang

arXiv:2407.10099·cs.CV·August 20, 2025

STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Yang Liu, Zhiyong Zhang

PDF

Open Access

TL;DR

STGFormer introduces a novel spatio-temporal graph attention framework that effectively models human body structure and dynamics, significantly improving 3D human pose estimation accuracy in videos.

Contribution

The paper proposes the Spatio-Temporal criss-cross Graph attention mechanism and a dual-path MHR-GCN to better capture spatiotemporal dependencies and higher-order information in video-based 3D pose estimation.

Findings

01

Achieves state-of-the-art results on Human3.6M dataset.

02

Outperforms previous methods on MPIINF-3DHP dataset.

03

Effectively models long-range spatiotemporal dependencies.

Abstract

The current methods of video-based 3D human pose estimation have achieved significant progress.However, they still face pressing challenges, such as the underutilization of spatiotemporal bodystructure features in transformers and the inadequate granularity of spatiotemporal interaction modeling in graph convolutional networks, which leads to pervasive depth ambiguity in monocular 3D human pose estimation. To address these limitations, this paper presents the Spatio-Temporal GraphFormer framework (STGFormer) for 3D human pose estimation in videos. First, we introduce a Spatio-Temporal criss-cross Graph (STG) attention mechanism, designed to more effectively leverage the inherent graph priors of the human body within continuous sequence distributions while capturing spatiotemporal long-range dependencies. Next, we present a dual-path Modulated Hop-wise Regular GCN (MHR-GCN) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsSoftmax · Attention Is All You Need · Graph Convolutional Network