Double-chain Constraints for 3D Human Pose Estimation in Images and Videos
Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Wenming, Yang

TL;DR
This paper introduces a novel Double-chain Graph Convolutional Transformer model that effectively captures multi-level dependencies in 3D human pose estimation from images and videos, achieving state-of-the-art results.
Contribution
The paper proposes a double-chain design combining GCN and Transformer, along with modules for local and global constraints, to improve 3D human pose estimation accuracy.
Findings
Achieves state-of-the-art performance on Human3.6M and MPI-INF-3DHP datasets.
Effective integration of temporal information with minimal computational overhead.
Outperforms previous methods across all action categories in key datasets.
Abstract
Reconstructing 3D poses from 2D poses lacking depth information is particularly challenging due to the complexity and diversity of human motion. The key is to effectively model the spatial constraints between joints to leverage their inherent dependencies. Thus, we propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose through a double-chain design consisting of local-to-global and global-to-local chains to obtain a complex representation more suitable for the current human pose. Specifically, we combine the advantages of GCN and Transformer and design a Local Constraint Module (LCM) based on GCN and a Global Constraint Module (GCM) based on self-attention mechanism as well as a Feature Interaction Module (FIM). The proposed method fully captures the multi-level dependencies between human body joints to optimize the modeling capability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout
