Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos
Junhao Zhang, Yali Wang, Zhipeng Zhou, Tianyu Luan, Zhe Wang, Yu Qiao

TL;DR
This paper introduces DG-Net, a dynamic graph convolutional network that adaptively learns spatial and temporal human-joint relations in videos, improving 3D pose estimation accuracy over fixed-structure methods.
Contribution
The paper proposes a novel DG-Net with dynamical spatial/temporal graph convolutions that adaptively identify joint affinities for better 3D pose estimation in videos.
Findings
DG-Net outperforms recent SOTA methods on benchmarks.
It requires fewer input frames and smaller model size.
Effective in reducing depth ambiguity and motion uncertainty.
Abstract
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiscriminative and Generative Network · Graph Convolutional Network · Convolution
