Rethinking Graph Convolution for 2D-to-3D Hand Pose Lifting

Chanyoung Kim; Donghyun Kim; Dong-Hyun Sim; Seong Jae Hwang; Youngjoong Kwon

arXiv:2605.13604·cs.CV·May 14, 2026

Rethinking Graph Convolution for 2D-to-3D Hand Pose Lifting

Chanyoung Kim, Donghyun Kim, Dong-Hyun Sim, Seong Jae Hwang, Youngjoong Kwon

PDF

TL;DR

This paper demonstrates that adaptive spatial attention mechanisms outperform traditional graph convolutional networks for 2D-to-3D hand pose lifting, especially when incorporating hand topology as a soft prior.

Contribution

It shows that self-attention with input-dependent aggregation surpasses GCNs, and that hand topology is best used as a soft positional encoding rather than a fixed adjacency.

Findings

01

Self-attention reduces MPJPE from 12.36 mm to 10.09 mm.

02

Skeleton-constrained graph attention recovers most of the performance gap.

03

Hand topology as a soft positional encoding is more effective than fixed adjacency.

Abstract

Graph convolutional networks (GCNs) are widely used for 3D hand pose estimation, where the hand skeleton is encoded as a fixed adjacency graph. We revisit whether this is the most effective way to incorporate hand topology in 2D-to-3D lifting. In this paper, we perform controlled, parameter-matched ablations on the FPHA benchmark and show that standard multi-head self-attention consistently outperforms GCN baselines. Even when the GCN is strengthened with multi-hop adjacency and matched parameter count, self-attention reduces MPJPE from 12.36 mm to 10.09 mm. A skeleton-constrained graph attention network recovers most of this gap, indicating that input-dependent aggregation is a major source of improvement, while fully connected attention yields additional gains. We further show that hand topology is most effective when introduced as a soft structural prior through graph-distance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.