A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
Changlong Jiang, Yang Xiao, Cunlin Wu, Mingyang Zhang, Jinghong Zheng,, Zhiguo Cao, and Joey Tianyi Zhou

TL;DR
This paper introduces A2J-Transformer, a novel model that enhances 3D interacting hand pose estimation from a single RGB image by capturing local details and global context using a transformer-based approach, achieving state-of-the-art results.
Contribution
It extends the A2J method with a transformer framework to better handle occlusion and articulation in 3D hand pose estimation from RGB images.
Findings
Achieves 3.38mm MPJPE improvement on InterHand 2.6M dataset.
Demonstrates strong generalization to depth domain.
Outperforms previous model-free methods in accuracy.
Abstract
3D interacting hand pose estimation from a single RGB image is a challenging task, due to serious self-occlusion and inter-occlusion towards hands, confusing similar appearance patterns between 2 hands, ill-posed joint position mapping from 2D to 3D, etc.. To address these, we propose to extend A2J-the state-of-the-art depth-based 3D single hand pose estimation method-to RGB domain under interacting hand condition. Our key idea is to equip A2J with strong local-global aware ability to well capture interacting hands' local fine details and global articulated clues among joints jointly. To this end, A2J is evolved under Transformer's non-local encoding-decoding framework to build A2J-Transformer. It holds 3 main advantages over A2J. First, self-attention across local anchor points is built to make them global spatial context aware to better capture joints' articulation clues for resisting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Stroke Rehabilitation and Recovery
MethodsAttentive Walk-Aggregating Graph Neural Network
