# Learning to Infer the Depth Map of a Hand from its Color Image

**Authors:** Vassilis C. Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos,, Antonis Argyros

arXiv: 1812.02486 · 2018-12-07

## TL;DR

This paper introduces a CNN-based method for estimating hand depth maps from single RGB images, using a new dataset and intermediate supervision to achieve accuracy comparable to low-cost depth cameras.

## Contribution

The paper presents the first approach for hand depth estimation from a single RGB image, utilizing a novel dataset and a staged CNN architecture with intermediate supervision.

## Key findings

- Achieves 22mm accuracy in hand depth estimation from RGB images.
- Introduces HandRGBD, a new dataset with over 20,000 hand images and depth maps.
- Demonstrates that RGB-based depth estimation can match low-cost depth camera performance.

## Abstract

We propose the first approach to the problem of inferring the depth map of a human hand based on a single RGB image. We achieve this with a Convolutional Neural Network (CNN) that employs a stacked hourglass model as its main building block. Intermediate supervision is used in several outputs of the proposed architecture in a staged approach. To aid the process of training and inference, hand segmentation masks are also estimated in such an intermediate supervision step, and used to guide the subsequent depth estimation process. In order to train and evaluate the proposed method we compile and make publicly available HandRGBD, a new dataset of 20,601 views of hands, each consisting of an RGB image and an aligned depth map. Based on HandRGBD, we explore variants of the proposed approach in an ablative study and determine the best performing one. The results of an extensive experimental evaluation demonstrate that hand depth estimation from a single RGB frame can be achieved with an accuracy of 22mm, which is comparable to the accuracy achieved by contemporary low-cost depth cameras. Such a 3D reconstruction of hands based on RGB information is valuable as a final result on its own right, but also as an input to several other hand analysis and perception algorithms that require depth input. Essentially, in such a context, the proposed approach bridges the gap between RGB and RGBD, by making all existing RGBD-based methods applicable to RGB input.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.02486/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1812.02486/full.md

## References

75 references — full list in the complete paper: https://tomesphere.com/paper/1812.02486/full.md

---
Source: https://tomesphere.com/paper/1812.02486