PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation
Ming Xu, Xu Zhang

TL;DR
PoseGRAF introduces a novel framework for monocular 3D human pose estimation that effectively models joint and bone dependencies using graph convolution, cross-attention, and adaptive fusion, outperforming existing methods on standard datasets.
Contribution
The paper presents a dual graph convolutional structure with cross-attention and dynamic fusion modules, along with an improved Transformer encoder, to enhance 3D pose estimation accuracy and robustness.
Findings
Outperforms state-of-the-art on Human3.6M and MPI-INF-3DHP datasets.
Demonstrates strong generalization on in-the-wild videos.
Effectively captures joint and bone dependencies for plausible pose estimation.
Abstract
Existing monocular 3D pose estimation methods primarily rely on joint positional features, while overlooking intrinsic directional and angular correlations within the skeleton. As a result, they often produce implausible poses under joint occlusions or rapid motion changes. To address these challenges, we propose the PoseGRAF framework. We first construct a dual graph convolutional structure that separately processes joint and bone graphs, effectively capturing their local dependencies. A Cross-Attention module is then introduced to model interdependencies between bone directions and joint features. Building upon this, a dynamic fusion module is designed to adaptively integrate both feature types by leveraging the relational dependencies between joints and bones. An improved Transformer encoder is further incorporated in a residual manner to generate the final output. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Anomaly Detection Techniques and Applications
MethodsDropout · Dense Connections · Concatenated Skip Connection · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Transformer
