Monocular 3D Hand Pose Estimation with Implicit Camera Alignment
Christos Pantazopoulos, Spyridon Thermos, Gerasimos Potamianos

TL;DR
This paper introduces an optimization-based method for 3D hand pose estimation from a single image that does not require camera parameters, using keypoint alignment and fingertip loss to improve accuracy and robustness.
Contribution
It presents a novel pipeline that estimates 3D hand pose from 2D keypoints without camera info, improving robustness in real-world scenarios.
Findings
Competitive performance on EgoDexter and Dexter+Object benchmarks.
Robustness demonstrated on in-the-wild images without camera calibration.
Sensitivity analysis of 2D keypoint estimation accuracy.
Abstract
Estimating the 3D hand articulation from a single color image is an important problem with applications in Augmented Reality (AR), Virtual Reality (VR), Human-Computer Interaction (HCI), and robotics. Apart from the absence of depth information, occlusions, articulation complexity, and the need for camera parameters knowledge pose additional challenges. In this work, we propose an optimization pipeline for estimating the 3D hand articulation from 2D keypoint input, which includes a keypoint alignment step and a fingertip loss to overcome the need to know or estimate the camera parameters. We evaluate our approach on the EgoDexter and Dexter+Object benchmarks to showcase that it performs competitively with the state-of-the-art, while also demonstrating its robustness when processing "in-the-wild" images without any prior camera knowledge. Our quantitative analysis highlights the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Image and Video Stabilization
