TL;DR
This paper introduces PGFormer, a lightweight multi-scale transformer with pyramid graph attention for 3D human pose estimation, effectively modeling long-range dependencies across body parts to improve accuracy and reduce model size.
Contribution
It proposes a novel Pyramid Graph Attention module and a pyramid-structured transformer architecture to better capture long-range dependencies in 3D human pose estimation.
Findings
Achieves lower error than state-of-the-art methods.
Uses smaller model size for comparable or better performance.
Demonstrates effectiveness on Human3.6M and MPI-INF-3DHP datasets.
Abstract
Action coordination in human structure is indispensable for the spatial constraints of 2D joints to recover 3D pose. Usually, action coordination is represented as a long-range dependence among body parts. However, there are two main challenges in modeling long-range dependencies. First, joints should not only be constrained by other individual joints but also be modulated by the body parts. Second, existing methods make networks deeper to learn dependencies between non-linked parts. They introduce uncorrelated noise and increase the model size. In this paper, we utilize a pyramid structure to better learn potential long-range dependencies. It can capture the correlation across joints and groups, which complements the context of the human sub-structure. In an effective cross-scale way, it captures the pyramid-structured long-range dependence. Specifically, we propose a novel Pyramid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLaplacian EigenMap · Absolute Position Encodings · Layer Normalization · Laplacian Positional Encodings · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer
