Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation

Mingjie Wei; Xuemei Xie; Yutong Zhong; Guangming Shi

arXiv:2506.02853·cs.CV·June 4, 2025

Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation

Mingjie Wei, Xuemei Xie, Yutong Zhong, Guangming Shi

PDF

1 Repo

TL;DR

This paper introduces PGFormer, a lightweight multi-scale transformer with pyramid graph attention for 3D human pose estimation, effectively modeling long-range dependencies across body parts to improve accuracy and reduce model size.

Contribution

It proposes a novel Pyramid Graph Attention module and a pyramid-structured transformer architecture to better capture long-range dependencies in 3D human pose estimation.

Findings

01

Achieves lower error than state-of-the-art methods.

02

Uses smaller model size for comparable or better performance.

03

Demonstrates effectiveness on Human3.6M and MPI-INF-3DHP datasets.

Abstract

Action coordination in human structure is indispensable for the spatial constraints of 2D joints to recover 3D pose. Usually, action coordination is represented as a long-range dependence among body parts. However, there are two main challenges in modeling long-range dependencies. First, joints should not only be constrained by other individual joints but also be modulated by the body parts. Second, existing methods make networks deeper to learn dependencies between non-linked parts. They introduce uncorrelated noise and increase the model size. In this paper, we utilize a pyramid structure to better learn potential long-range dependencies. It can capture the correlation across joints and groups, which complements the context of the human sub-structure. In an effective cross-scale way, it captures the pyramid-structured long-range dependence. Specifically, we propose a novel Pyramid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MingjieWe/PGFormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLaplacian EigenMap · Absolute Position Encodings · Layer Normalization · Laplacian Positional Encodings · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer