Using a single RGB frame for real time 3D hand pose estimation in the wild
Paschalis Panteleris, Iason Oikonomidis, Antonis Argyros

TL;DR
This paper introduces a real-time method for 3D hand pose estimation from a single RGB image, combining deep learning and generative models to operate effectively in unconstrained environments.
Contribution
It presents a novel pipeline that uses a state-of-the-art detector, 2D joint estimation with OpenPose, and model fitting for monocular 3D hand pose estimation in real time.
Findings
Achieves competitive accuracy compared to RGBD-based methods
Operates in real-time in wild scenarios
Outperforms previous monocular approaches in diverse conditions
Abstract
We present a method for the real-time estimation of the full 3D pose of one or more human hands using a single commodity RGB camera. Recent work in the area has displayed impressive progress using RGBD input. However, since the introduction of RGBD sensors, there has been little progress for the case of monocular color input. We capitalize on the latest advancements of deep learning, combining them with the power of generative hand pose estimation techniques to achieve real-time monocular 3D hand pose estimation in unrestricted scenarios. More specifically, given an RGB image and the relevant camera calibration information, we employ a state-of-the-art detector to localize hands. Given a crop of a hand in the image, we run the pretrained network of OpenPose for hands to estimate the 2D location of hand joints. Finally, non-linear least-squares minimization fits a 3D model of the hand to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
