Multi-hop graph transformer network for 3D human pose estimation

Zaedul Islam; A. Ben Hamza

arXiv:2405.03055·cs.CV·May 7, 2024

Multi-hop graph transformer network for 3D human pose estimation

Zaedul Islam, A. Ben Hamza

PDF

Open Access

TL;DR

This paper presents a multi-hop graph transformer network that effectively captures spatio-temporal dependencies for 3D human pose estimation from videos, addressing occlusion and depth ambiguity challenges.

Contribution

It introduces a novel architecture combining multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods for improved 3D pose estimation.

Findings

01

Achieves competitive results on benchmark datasets.

02

Effectively models long-range spatio-temporal dependencies.

03

Demonstrates strong generalization ability.

Abstract

Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. The combination of multi-head self-attention and multi-hop graph convolutional layers enables the model to capture both local and global dependencies, while the integration of dilated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Laplacian EigenMap · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Byte Pair Encoding