JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image
Linpu Fang, Xingyan Liu, Li Liu, Hang Xu, and Wenxiong Kang

TL;DR
This paper introduces JGR-P2O, a novel pixel-wise prediction network with joint graph reasoning for accurate and efficient 3D hand pose estimation from a single depth image, outperforming existing methods.
Contribution
It proposes a GCN-based joint graph reasoning module combined with dense pixel offset prediction for end-to-end 3D hand pose estimation.
Findings
Achieves state-of-the-art accuracy on multiple benchmarks.
Runs at approximately 110fps on a single GPU.
Uses only 1.4 million parameters for efficiency.
Abstract
State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
