# Learning to Estimate 3D Hand Pose from Single RGB Images

**Authors:** Christian Zimmermann, Thomas Brox

arXiv: 1705.01389 · 2017-10-17

## TL;DR

This paper introduces a deep learning approach for estimating 3D hand pose from single RGB images, overcoming the challenge of missing depth information by learning a 3D articulation prior.

## Contribution

It presents a novel deep network that learns a 3D articulation prior and introduces a large synthetic dataset for training, enabling accurate 3D hand pose estimation from RGB images.

## Key findings

- Effective 3D hand pose estimation from RGB images demonstrated.
- The approach generalizes well to sign language recognition tasks.
- Synthetic dataset facilitates training of deep networks for this task.

## Abstract

Low-cost consumer depth cameras and deep learning have enabled reasonable 3D hand pose estimation from single depth images. In this paper, we present an approach that estimates 3D hand pose from regular RGB images. This task has far more ambiguities due to the missing depth information. To this end, we propose a deep network that learns a network-implicit 3D articulation prior. Together with detected keypoints in the images, this network yields good estimates of the 3D pose. We introduce a large scale 3D hand pose dataset based on synthetic hand models for training the involved networks. Experiments on a variety of test sets, including one on sign language recognition, demonstrate the feasibility of 3D hand pose estimation on single color images.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.01389/full.md

## Figures

95 figures with captions in the complete paper: https://tomesphere.com/paper/1705.01389/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1705.01389/full.md

---
Source: https://tomesphere.com/paper/1705.01389